Sommaire du brevet 1236577

(12) Brevet:	(11) CA 1236577
(21) Numéro de la demande:	1236577
(54) Titre français:	APPAREIL ET METHODE D'ADAPTATION ACOUSTIQUE
(54) Titre anglais:	APPARATUS AND METHOD FOR PERFORMING ACOUSTIC MATCHING
Statut:	Durée expirée - après l'octroi

Données bibliographiques

(51) Classification internationale des brevets (CIB):	G10L 15/10 (2006.01) G10L 15/05 (2013.01) G10L 15/187 (2013.01)
(72) Inventeurs :	BAHL, LALIT R. (Etats-Unis d'Amérique) MERCER, ROBERT L. (Etats-Unis d'Amérique) DEGENNARO, STEVEN V. (Etats-Unis d'Amérique)
(73) Titulaires :	INTERNATIONAL BUSINESS MACHINES CORPORATION
(71) Demandeurs :	INTERNATIONAL BUSINESS MACHINES CORPORATION (Etats-Unis d'Amérique)
(74) Agent:
(74) Co-agent:
(45) Délivré:	1988-05-10
(22) Date de dépôt:	1985-11-06
Licence disponible:	S.O.
Cédé au domaine public:	S.O.
(25) Langue des documents déposés:	Anglais

Traité de coopération en matière de brevets (PCT):	Non

(30) Données de priorité de la demande:

Numéro de la demande	Pays / territoire	Date
672,974	(Etats-Unis d'Amérique)	1984-11-19

Abrégés

Abrégé anglais

ABSTRACT
The invention herein provides, in a speech recognition
system which represents each vocabulary word or a portion
thereof by at least one sequence of phones wherein each
phone corresponds to a respective phone machine, each phone
machine having associated therewith (a) at least one
transition and (b) actual label output probabilities, each
actual label probability representing the probability that a
subject label is generated at a given transition in the
phone machine, a method of performing an acoustic match
between phones and a string of labels produced by an
acoustic processor in response to a speech input, the method
comprising the steps of: forming simplified phone machines
which includes the step of replacing by a single specific
value the actual label probabilities for a given label at
all transitions at which the given label may be generated in
a particular phone machine; and determining the probability
of a phone generating the labels in the string based on the
simplified phone machine corresponding thereto.

Revendications

Note : Les revendications sont présentées dans la langue officielle dans laquelle elles ont été soumises.

The embodiments of the invention in which an exclusive
property or privilege is claimed are defined as follows:
1. In a speech recognition system which represents
each vocabulary word or a portion thereof by at least
one sequence of phones wherein each phone corresponds
to a respective phone machine, each phone machine having
associated therewith (a) at least one transition and (b)
actual label output probabilities, each actual label
probability representing the probability that a subject
label is generated at a given transition in the phone
machine,
a method of performing an acoustic match between phones
and a string of labels produced by an acoustic processor
in response to a speech input, the method comprising the
steps of:
forming simplified phone machines which includes the
step of replacing by a single specific value the actual
label probabilities for a given label at all transitions
at which the given label may be generated in a particular
phone machine; and
determining the probability of a phone generating the
labels in the string based on the simplified phone ma-
chine corresponding thereto.
42

2. A method as in Claim 1 wherein each string of
phones corresponding to a vocabulary word or predefined
portion thereof represents a phonetic baseform, the
method including the further step of:
arranging the baseforms into a tree structure in which
baseforms share a common branch for as long the
baseforms have at least similar phonetic beginnings,
each leaf of the tree structure corresponding to a com-
plete baseform.
3. A method as in Claim 2 wherein the labels are
produced at successive time intervals, the method com-
prising the further step of:
matching no more than a predefined maximum number of
labels with corresponding vocabulary words.
43

4. A method as in Claim 3 wherein labels occur at
intervals on the order of one centisecond each and
wherein the predefined maximum length is on the order
of one hundred centiseconds.
5. A method as in Claim 1 wherein the specific value
assigned to all label probabilities for a particular
label in a given phone machine is equal to at least the
maximum actual label probability over all transitions
for said particular label in the given phone machine.
6. A method as in Claim 5 wherein each phone machine
also has associated therewith (c) a length distribution
for each phone which indicates probability as a function
of the number of labels produced by a particular phone,
the method including the further step of:
converting the length distribution for each phone into
a uniform probability length pseudo-distribution which
includes the step of ascribing a uniform value to the
probability of a given phone producing any number of
labels defined in the length distributior.
44

7. A method as in Claim 6 comprising the further
steps of:
finding the minimum length of labels having a nonzero
probability; and
confining the uniform pseudo-distribution to lengths at
least as long as the minimum length, lengths less than
the minimum length being assigned a probability of zero.
8. A method as in Claim 7 including the steps of,
for each phone machine:
inputting a string of labels to the phone machine;
generating a phone match value;
generating a phone end-time distribution; and
deriving a start-time distribution for the next phone
from the end-time distribution of the current phone;
the match value and the end-time distribution being
generated in response to the inputting into the phone
machine of (a) labels and (b) the start-time distrib-
ution derived from the previous phone.

9. A method as in Claim 8 comprising the further
step of:
forming a match score for a given baseform including the
step of summing the match values for sequential phones
which represent at least a beginning portion of the
given baseform.
10. A method as in Claim 9 including the further
step of:
selecting a number of words represented by respective
baseforms which have the highest match scores.
11. A method as in Claim 10 including the further
step of:
performing a detailed acoustic match with an unsimpli-
fied phone machine for each word of the selected number
of words to determine a most probable word.
12. A method as in Claim 2 wherein each phone ma-
chine also has associated therewith (c) a length dis-
tribution for each phone which indicates probability as
a function of the number of labels produced by a par-
ticular phone, the method including the further step of:
46

converting the length distribution for each phone into
a uniform probability length pseudo-distribution which
includes the step of ascribing a uniform value to the
probability of a given phone producing any number of
labels defined in the length distribution.
13. A method as in Claim 12 wherein the specific
value assigned to all label probabilities for a partic-
ular label in a given phone machine is equal to at least
the maximum actual label probability over all transi-
tions for said particular label in the given phone ma-
chine.
14. A method of performing an acoustic match of
words in a vocabulary against a string of labels which
represent a speech input, the method comprising the
steps of:
entering as inputs to the phone machine of a given phone
(a) the string of labels and (b) a respective start-time
distribution for the given phone; and
generating (a) an end-time distribution and (b) a match
value of the given phone relative to the entered labels
based on the generated end-time distribution;
47

the match value generated for each particular phone
corresponding to the probability that said each partic-
ular phone produced the entered string of labels.
15. A method as in Claim 14 wherein said end-time dis-
tribution generating step for each given phone includes
the steps of:
determining a start-time distribution for a current
phone from the end-time distribution of the previous
phone;
setting a uniform length probability for each label
string length that (a) may have been produced by a given
phone and (b) is between a minimum length having a non-
zero probability and a predefined maximum length;
thereby defining a length pseudo-distribution that is
characterized (a) as uniform between the minimum length
and maximum length and (b) as zero for all other lengths;
setting, for each of a plurality of labels, a label
probability replacement value that a given phone machine
produces a given label at any transition in the phone
machine; and
48

determining the probability of end-times in an end-time
distribution based on label output replacement values
and start-time distribution.
16. A method as in Claim 15 including the further steps
of:
forming a tree of baseforms wherein each baseform ex-
tends from the base of the tree to a leaf thereof as a
sequence of phones, each baseform representing a word;
and
scoring the probability that a given word represented
by a respective sequence of phones matches the entered
labels, said scoring including the step of summing the
match values of phones along the respective sequence.
17. A method as in Claim 15 wherein at least some of the
start-time distributions are derived from the end-time
distribution of the most recent previous phone.
18. Apparatus for matching words with a string of in-
coming labels in a pattern recognition system, the ap-
paratus comprising:
at least one phone machine;
49

each phone machine being characterized by having (a) a
plurality of states and transition paths between states,
(b) transition probabilities T(i?j) representing the
probability of state Sj given a current state Si where
Si and Sj may be the same state or different states, and
(c) actual label probabilities wherein each actual label
probability p(yk) indicates the probability that a label
yk is produced by a given phone machine at a given
transition from one state to a subsequent state where k
is a label identifying notation; and
each phone machine including (a) means for assigning to
each yk in said each phone machine a single value p'(yk)
and (b) means for replacing each actual output proba-
bility p(yk) at each transition in a given phone machine
by the single value p'(yk) assigned to the corresponding
yk.
19. Apparatus as in Claim 18 wherein a p'(yk) for a re-
spective label is no less in value than any p(yk) for
the respective label yk in a given phone machine.
20. Apparatus as in Claim 18 wherein the length of the
string of labels generable by each phone machine is
characterized (a) as being between a minimum length and
a maximum length and (b) as having a uniform length

distribution wherein the probability for each length in
the length distribution between the minimum and maximum
length is a uniform value;
the probability of label lengths less than the minimum
length or greater than the maximum length being set as
zero.
21. Apparatus as in Claim 20 wherein said assigning
means includes means for representing the value p'(yk)
for a given yk in a given phone machine by the maximum
p(yk) value for the given yk in the given phone machine.
22. Apparatus as in Claim 21 including means for setting
the uniform value of the length probability to be equal
to the maximum probability of any length generable by
each particular phone machine.
23. Apparatus as in Claim 20 wherein said each phone
machine further includes:
first means for generating an end-time distribution for
each phone based on (a) the labels entering the phone
machine for said each phone and (b) a start-time dis-
tribution of said each phone; and
51

second means for evaluating the probability that a par-
ticular phone produced an input string of labels based
on the generated end-time distribution of the particular
phone.
24. Apparatus as in Claim 23 including means for deter-
mining each of at least some of the start-time distrib-
utions from the end-time distribution of the most recent
previous phone.
25. Apparatus as in Claim 24 wherein the minimum length
is zero, the apparatus further comprising:
means for characterizing the probability of each end
time in the end-time distribution by a value .theta.n which
is equal to the end time probability at time tn divided
by the uniform length probability; and
means for computing .theta.n for successive times tn as:
.theta.n = qn + .theta.n-1p'n
wherein p'n represents the replacement value for the
incoming string label that is to be generated at time
tn by the phone machine to yield the corresponding end
time.
52

26. Apparatus as in Claim 19 wherein said each phone
machine further includes:
first means for generating an end-time distribution for
each phone based on (a) the labels entering the phone
machine for said each phone and (b) a start-time dis-
tribution of said each phone;
second means for evaluating the probability that a par-
ticular phone produced an input string of labels based
on the generated end-time distribution of the particular
phone; and
means for determining each of at least some of the
start-time distributions from the end-time distribution
of the most recent previous phone;
wherein said first means includes, for each of a plu-
rality of end times which together form the end-time
distribution:
means for determining a sum of product terms, each
product term corresponding to (a) a start-time
probability, (b) a label length probability, and
(c) the probabilities of label outputs generated
53

to provide the label length in the product term;
and
means for adding the sums of product terms to pro-
vide a match score for the phone corresponding to
the end-time distribution.
27. A method of determining at least one word in a vo-
cabulary having the highest probability of having gen-
erated a given string of incoming labels that were
produced in response to a speech input. the method com-
prising the steps of:
characterizing each word as a sequence of phonetic ele-
ments, wherein each phonetic element has (a) a start-
time distribution of probabilities qn corresponding to
respective successive start times tn, (b) a plurality
of states between which transitions occur, (c) a plu-
rality of transition probabilities, each indicating the
probability that a given transition in a given phonetic
element occurs, (d) a plurality of actual label proba-
bilities, each actual output probability indicating the
probability that a particular phonetic element generates
a particular label at a particular transition in the
particular phonetic element; and
54

forming an approximate match for a subject word includ-
ing the steps of:
replacing all actual label probabilities associated
with a given label generated by a given phonetic
element at any transition therein with a corre-
sponding specific replacement value; and
determining for one phonetic element after another
in the subject word the probability ?n of a pho-
netic element ending at a respective one of a plu-
rality of successive end times tn as a function of
start-time distribution, the probability of the
phonetic element generating a label string of each
of various lengths, and the replacement value
p'(yk) for each respective label yk that is to be
generated by the phonetic element to produce the
incoming string of labels.
28. A method as in Claim 27 wherein the approximate
matching further includes the step of:
characterizing the label length distribution as uniform
between a minimum length and a maximum length with the
probability elsewhere being set as zero;

each ?n thereby being a function of start-time distrib-
ution, the uniform probability for each length between
the minimum length and the maximum length, and the re-
placement value p'(yk) for each respective label yk that
is to be generated by the phonetic element to produce
the incoming string of labels.
29. A method as in Claim 28 wherein the ?n determining
step includes the step of:
recursively computing successive values of .theta.n corre-
sponding to successive values of ?n divided by the uni-
form length;
wherein p'n represents the replacement value for the
incoming string label that is to be generated at time
tn by the phonetic element to yield the corresponding
end time.
30, A method as in Claim 29 including the further step
of:
limiting the number of labels examined in determining
the probability of ?n for each phonetic element to a
maximum of J labels.
56

31. A method as in Claim 30 wherein the end time proba-
bilities are determined for only the first J+K incoming
labels where K is the number of states in the corre-
sponding phonetic element.
32. A method as in Claim 29 comprising the further step
of:
combining the values for the successive .theta.n's to derive
a match value for the phonetic element corresponding
thereto.
33. A method as in Claim 28 comprising the further step
of:
combining the values for the successive ?n's to derive
a match value for the phonetic element corresponding
thereto.
34. A method as in Claim 33 comprising the further step
of:
combining match values for successive phonetic elements
in a subject word to provide a word match score.
57

35. A method as in Claim 34 comprising the further step
of:
forming a list of candidate words in order of word match
scores, at least most of the words in the vocabulary
being excluded from the formed list.
36. A method as in Claim 35 comprising the further step
of:
performing a detailed match against each candidate word
on the formed list, including the steps of determining
for each candidate word the probability that a sequence
of phonetic elements representing the candidate word
generates labels in the incoming string based on, for
each phonetic element, (a) the start-time distribution
of probabilities qn corresponding to respective succes-
sive start times tn therefor, (b) the plurality of
states between which transitions occur therefor, (c) the
plurality of transition probabilities therefor, each
indicating the probability that a given transition in a
given phonetic element occurs, and (d) the plurality of
actual label probabilities therefor, each actual output
probability indicating the probability that the partic-
ular phonetic element generates a particular label at a
58

particular transition in the particular phonetic ele-
ment.
37. A method as in Claim 36 comprising the further step
of:
performing a language model match on candidate words on
the formed list.
38. A method as in Claim 33 comprising the further step
of:
selecting the specific replacement values so that the
match value for each phonetic element is an overestimate
of a match value generated if the actual probabilities
were not replaced with the replacement values.
39. A method as in Claim 28 wherein the minimum length
is zero and wherein the ?n determining step includes the
step of:
recursively computing successive values of ?n corre-
sponding to successive values of ?n divided by the uni-
form length as:
.theta.n = qn + .theta.n-1p'n
59

wherein p'n represents the replacement value for the
incoming string label that is to be generated at time
tn by the phonetic element to yield the corresponding
end time.
40. A method of performing an approximate acoustic match
between incoming labels and words in a vocabulary where
each word is represented by a sequence of Markov models,
each Markov model indicating states and transitions
therebetween, the method comprising the steps of:
determining the probability of being at a subject state
at a given time, which includes the steps or (a) recog-
nizing each previous state that has a transition which
leads to the subject state and the respective probabil-
ity of each such previous state; (b) recognizing, for
each such previous state, a value representing the
probability of the label that must be generated at the
transition between each such previous state and the
current state in order to conform to the label string;
and (c) combining the probability of each previous state
and the respective value representing the label proba-
bility to provide a subject state probability over a
corresponding transition;

determining the overall probability of being at the
subject state from the subject state probabilities over
all transitions leading thereto; and
combining the overall probabilities for a given phone
in a word to determine the likelihood of the given phone
producing the string of labels.
61

Description

Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.

Y0983-0~9
L2365i7~7
APPARATUS AND ~ETHOD FOR PERFOR`1ING ACO~STIC `IATCHI~G
BAC~GRO~ND OF rHE INVE~TTION
I. FIELD OF THE INVENTION
The present invention relates primarily to the art of
speech recognition, specifically the art of perfo-ming
a statistical match between vocabulary words and incom-
ing labels produced by an acoustic processor in response
to a speech waveform input.
II. DESCRIP~IO~l OF THE PRIOR ART
Typically, the purpose of a speech recognition sys~em
or machine i5 to automatically transform natural speech
into some other form, for example written form. In
achieving this aim, various general approaches have been
considered. One approach is directed ~o simulating human
speech interpretation processes. Another approach is to
view speech in a statistical context.
In the statistical approach itself, several techniques
have been considered as suggested bv the Bahl, Jelinek,
and Mercer article, "A Maximum Likelihood Approach to
Con~inuous Speech Recognition," IEEE Transactions on

Y0983-029
-2- ~3~S77
Pattern Analvsis and Machine Intelligence, Volume
PAMI-5, Number 2, pp. 179-190 (1983). In the 8ahl et
S al. article, it is noted thaL the typical model of a
speech recognition sys~em includes a text generator
followed by a speaker, or talker. The text generator
determines what is to be said and the speaker produces
a natural speech waveform. The natural speech waveform
enters an acoustic processor the output from which en-
ters a linguistic decoder. Depending on the technique
employed, the above-noted elements may be associated in
various ways. Bahl et al. combine the speaker and
acoustic processor to function as an acoustic channel
wherein the speaker provides text as a speech waveform
and wherein the acoustic processor acts as a data
compressor which provides a string of labels (also re-
ferred to as symbols or fenemes) to the linguistic de-
coder. The labels may be generated in any of a number
of ways and are commonly identified collectively as the
string Y made up of sequential labels YlY2Y3---- The
purpose of the linguistic decoder is to represent the
original spoken text, in some prescribed form, based on
the incoming string of labels.
In the above~noted article, one acoustic -~
~rocessor the IBM* centisecond acoustic,
processcr (CSAP) -- is describe~
* Trade l~ark

Y0983-079
3 ~2~
as transforming the speech waveform into a slring of
parame~er vectors. Each parameter vector is compared to
stored prototypes (or standard vectors) -- the distance
from the parameter vector and each prototype being de-
termined. The "label" for the proto~;ype which is clos-
est is then assigned to the waveform parameter vector.
The label can have any of various forms and may be de-
termined in any of various 1rnown manners in accordance
with existing technology.
The purpose of the linguistic decoder is to perform a
matching process between the incoming labels and words
provided in the system vocabulary. In the probabilistic
approach set forth in the Bahl et al. article, the lin-
guistic decoder aims at de~ermining a word strinO l; that
has the highest probability of having produced the
string of labels Y1Y2Y3--- Mathematically, this is re-
presented by the expression:
2~ MaxPr(h¦Y~, (1)
the maximum probability of h given Y over all word
strings ~'. According to well-~nown prob~bility theory,
this can be ~-ritten as:
Pr~WIY)= Pr(W) x Pr(YIW)tPr~y) (2)

Y0983-029
-4- ~6~77
where Pr(Y) is independent of W and the probability of
a given word string W, namely Pr(W), is determined by a
language model in the linguistic decoder.
Suppose that at some point in the decoding process some
initial substring. for example YlY2 YT~ has been ten-
tatively decoded as the ~ord string WlW2...Wn. The
present invention is directed to determining a set of
candidate words W( +i) for which
( ~n+l)lYl YTYT+l YT'k' 1 n)
is relatively large --compared to other words in ~he
vocabulary - for some value of k.
In determining Pr(YIW), Markov modelling has been con-
sidered. The number of computations required by several
linguistic decoding techniques is noted in the Bahl et
al. article as being fairly high, especially with
larger vocabularies on the order of ~000 words and more
for example.
A key question in linguistic decoding has therefore been
how to determine Pr(Y¦W) for word strings from a vocab-
ulary without requiring inordinate computation time and
without sacrificing accuracy in decoding.

Y0983-029
~365~
This object and others are addressed by the present in-
vention.
SU~I!~RY OF THE I~VE~TIO~
In accordance with the present invention, a linguistic
decoder is provided which facilitates the determination
of which word or words in a vocabulary have the highest
probability of producing a par~icular string of labeis.
To achieve this object, the presen~ invention features
apparatus ,?nd method for performing a statistical match
of words to labels with a number of appro~imations that
expedite the matching determination without introduclng
undesired loss in accuracy. In addition, the presen~
invention relates to apparatus and method wherein ~.ords
having similar phonetic beginnings are matched against
the incoming labels simultaneously for so lonO as the
words have similar phonetic beginnings.
In noting the approximations embodied in the invention,
it should be recogni~ed that the invention models each
vocabulary word as a sequence of phones. ~ach phone is
represented by a phone machine. Each phone machine is
precisely characterized as having (a) a plurality of
states, (b) transitions from state to state and a prob-
2S ability associated with each transition, and (c) an ac-

Y0983-0~9
~ ~3~
tual output probabilit-y that a given label is produced
by a given phone machine at a given transition -- each
phone machine (corresponding to a given phone) defining
the probability that a given label is generated at a
given transition thsreof. It is possible to determine
match scores for words based on these characteris~ics,
however the number of computations is high. In accord-
ance with the present invention, prefer2bly each ohorle
machine is simplified by replacing the ac;ual l_bel
probability for each label at all transitions in a g ven
phone mach ne with a speciflc replacement value. The
specific replacement value is preferably selected so
1~ that the match value for a given phone with the rn-
placement value included is an overestimation o' tlls
match value achieved by the detailed match ~here ~he
replacement values do not replace the actual label
probabilities. One way of assuring this condition is by
selecting each replacement ~-alue so that no probabili~y
corresponding to a given label in a given phone machine
is greater than the replacement value thereof. 3y sub-
stituting the actual label probabilities in a phone ma-
chine with corresponding replacement values, the number
2_ of required computations in determining a match sco.e
for a word is reduced greatly. ~oreover, since the re-
placement value is preferably an overestimation, the
resulting rnatch score is not less than would have pre-

Y0~83-029
~3~S~7
viously been determined without the replacement. Ac-
cordingly, the object of recucing computation wlthout
eliminating likely candidate words is achieved.
Another approximation relates to an additional facLor
that enters into the determination of 2 match score be-
tween a word and a string of labels, n~mely a label
length dis~ribution associated with each particular
phone machine. That is, for each phone machine~ there
is a probability distribution that each nUmDer of labels
contained between a minimum L i and a maximum number
of labels Lmax is produced by the particular phone ma
chine. To further facili-ate and reduce computation in
1; accordance ~ith the invention, the l~bel leng~'l proba-
bility distribution (between the minimum length and
maximum length of iabels) is considered uniform so tha~
the probability of each length of labels (between L .
mln
and L ) is the same.
max
As an additional refinement! the present invention in-
cludes a limitation on the number of labels examined b~-
the phone machines in order to determine a match v~lue
between a corresponding word and a string of incoming
labels. This added feature achieves the objects of re-
2; ducing decoding delay and reducing inequalities arising

Y09S3-0~9
-8- ~2~S77
from compa-ing match scores for words of different
leng~hs.
o Furthermore, it is an object of the invention to derive
a list of candidate words b;- means of a basic f25~ ma.ch
(which approximates actu~l label probabillties for each
label in a given phone with a respective replaccment
value) or an alternative fast m~.ch lwhich also approx-
imates label length probabilities ~;ith a specifiec value
for a given phone) wherein the candida~e words are
processed by successive detailed match phone machines
and/or a language model in order to arrive at a single
word and, as appropriate, some possible alterna~i~es--
1~ especially where the fast match machines limit the num-
ber of ~ncoming labels examined to de~ermine a ma~ch
value for a phone.
In achieving the object of processing the beginnings of
words simultaneously, the present in~tention defines
words or portions thereof as phonetic baseforms arranged
in a tree structure. Each baseform, it is noted, is re-
presented by a sequence of phones each of wnich has i~s
own phone machine corresponding thereto. For each
baseformf a sequence of phone machines extends from ~he
2~ root of the tree. For so long as two or more baseforms
have similar phonetic begirmings starting from the root,

~09~ 0~9
g ~ 77
a commcn branch of phone machines is provided therefor
Two or more baseforms can hereby be selected or elimi-
na~ed as candidate words by simultaneously processing
~hem through Ihe same phone machines at the same time
for so long as the baseforms have sir"ilar beginnin~s
The object of reducing computation without loss in ac-
curacy is thereby rur her achie~ed
In a preferred form, the invention r_lates to perforl~ing
an acoustic match in a 1inguistic decoder wherein each
phone machine therein is characteri~ed as naving (a) a
plurality of stales and .ransi~ion pqths between sta-es,
(b) transi.ions tr(Sj/Si) havi!g probabili~ies Tti`j)
1~ each of which represents the DroD~qbili-'; OL a ~ransition
to a state Sj given a current state Si where Si and Sj
may be the same state or diffsrent states, and (c) actual
label probabilities wherein each actual label probabil-
ity p(ykli';) indicates .he probabili~y that a label y~
is produced by a given phone machine at a given transi-
tion from one state to a subsequent state where k is a
label identifying notalion; each phone machine including
(a) means for assigning to each Yk in said each phone
machine a single specific value P~(yk) and (b~ means for
2- replacing each actual output probability p(ykli~j) at
each transition in a given phone machine by the single
specific value p'(y~) assigned to the corresponding Yk

Y0983-0~9
.A 1 0
~2;3~5~7
Preferably, ~he replacement value is at least as great
as the maximum actual label probabilitv for the corre-
; sponding Yk label at any transition in a particular
phone machine.
The foregoing and o~her objec~s, fea~ures and advantages
of the invention will be apparenr rom rhe more par~ic-
ular desc-iption of the preferred embodiments o ~he
invention, as illustrated in the accompanying dra~ing.
BRIEF DESCRIPTIO~' OF THE DRAI;I~GS
FIG. 1 is an illus~ration of a ~ree s~ruc~ure of
baseforms according tO the inven~ion.
FIG. 2 is an illustration of an example detailec match
phone machine in accordanse ~ith the invention.
FIG. 3 is a trellis diagram showing the state t ar.si-
tions over time for the detailed ma~cn ?hone machine.
FIG. 4 is a general diagram cf a phone machine accord ng
to the invention.
FIG. ~ is a diagram showing a fast match computation
under predefined conditions.

Y0~83-0~9
-11-
~L~36~77
FIG. 6 is an illustration showing ~he derivation of a
starr-time distribution for a phone from the end-~ime
S dist-ribution generated by the phone machine correspond-
inO to the ~IOSt recent previous phone.
FIG. 7 is an illustration of an alternative fast match
phone machine accorair.~ to the in~-en.ion.
FIG. 8 is an illustration o' an ~lternativA fast macch
phone machine wherein the number of labels e~amined in
the matching procedure is limiced in accordance with the
invenlion.
: .
In the drawings, li~.e elements are des.cna~ed with sim-
ilar reference numbe~s~ and identical elements in dif-
1~ ferent specific embodimencs are designated by iden~ical
reference numbers.

Y~5~3-029 ~236S77
-12-
DESCP~IPTIC.~ OF PRE~Er~P~ED E~I~ODIMENTS OF THE I~-~E~lIO~
I. TREE STR~CT~P~E OF BAS~FOR`IS
; In Fig. 1, a plurality or -~70rcs are arranged in a tree
structure 100. Tlle tree has a root 102 from ~-hich a
pluralit~ of baseforms extend. Each baseform is a pho-
netic baseform and extends to a leaf of the tree~ eacll
leaf of the tree representing a ..orc in the vocabulary.
Pre~erably, the voc2buiary is on the order of SOOO-horis
or more, although smaller vocabularies may also be em-
ployed.
Each baseform is shohn including a string of pllones,
each baseform s~ar~ing at tne roo~ 102 of ~he tree
1~ structure 100 and ending at a respective leaf. A number
of baseforms, it is noted, have similar phonetic begin-
nings. For so long as a plurality of Dase,~orms na~e
similar phonetic beginnings, such baserorms share common
initial branches along the tree structure 100. Starting
at the tree root 10~, the baseforms 10~ through 118 for
manager, managers, memo, memory, memoranda, and memo-
randum, respectivel~y, all extend along a common branch
identified as the phonetic element MX. Similarly, the
baseforms 112 through 11~ further extend together along
~ the branches identified as the phoneti_ elements ~X,
EHO, MX. Where the baseforms de?art phonetically, the

~0~83-Q~9 -13~ 6S77
tree structure lG0 branches in different direclions. Tn
accordance with the inventior, the malching of ?hone~ic
S elemenls ~l~ and then EH0 is performed once for all the
baserorms 112 through 118. That is, baserorms are not
e.xamined se?aratek- one at a time. Instead, processino
takes place along ~ranches in the tree structure 100 so
tnat words na~ing similar phonetic oeginnin~s are ?roc-
essed simultaneousl5- for so lona as ~heir res?ecti~-e
phones lie along a common path of branches. ~y ?rsc-
essing along the common path, a number of words ma~- De
considered simultaneouslS- for inclusion in a list of
candidate words lrom which the actual recognized wora
lo may be selected. Li~ewise, a number or WO.da can be
eliminated simultaneously.
In Fig. ll the phonetic elemenls are phones each of which
corresponds to a conven-tionally definec sound. ~-hen
strung together, the phones form word-representinv
baseforms. For example, a first pronuncialion of the
word "the" is represented by ~he phone DH followed by
the phone ~-Hl. Approximately /0 ~o lOG phones, when
appropriatel~- selected and s~rung -~ogether, can re?re-
sent the words in the English 'anguage.
~; A word --when pronounced in ~ore than way-- is repres-
ented in Fig. 1 by distinct strings of conventional

Y0983-0~9
-14- ~36~7~
phones, as su~gested b-~ the disLinct baseforms for the
two pronunciations of the ~ord "~he", namel~ lHLi and
THE2. Alternativel~, the same ~ord mav be represer.ted
by a basefor~ of "clinks" wherein each ciin~ can e?re-
sent various ?ronunciations of a par-~icular por~ion of
a word. The t~-o pronunciations a~ter the DH for the word
"the" could be represen~ed by 2 sir.Ole clink which would
account for ~he ~Hl and EEl phones. Similar1~T-, a se-
quence of phones ma~- be in par311el wilh one or more
other phones as alternative pronunciations for the same
segment of a ~-ord or string of words --the parallel paths
together representing a clink. Darallel p~ths which
l~ cross word boundaries can also be re~re.,en~ed by a
c.link. Whethsr defininO basefor~s in cerr.ls or conven-
tional phones or clinks, similar principles .~re in-
volved. Hence, in this patent application, references
hereafter to "phones" are considered generic to inclu~e
conventional phones, clinks and other sim_la pllonetic
elements which represent elemental inputs.
II. PHONE ~1ACHINES
A. I~TROD~CTION
In the present invention, a statis,ical àetermination
is made as to ~hich words in the vocabulary have the

~'~983-079
highest proba'oility OL^ hz~ing generaled a string of in-
coming labels generated by an acoustic channel. .4s
noted herelnbefore, each h~ord is represented by a se-
quence of phones. Each phone is, n turn, character zed
by a respecti~-e ohone machine. Each phone machine storcs
data which provides an indication of the likelihood that
the phone corres?oncing there~o generates any label
string. ~hen a particular label s,.ing s entcred in~o
a gi~en pnone machine, the gioen phonc macnine derer-
mines --from its stored aata-- the li~;elihood ~hat he
given phone machine oroduces the p.~rticular incom~n~
label string.
l; ~s described below~ phone machinns o~ SeVera1 L~.'peS .1. C
em'bodied in accordance ~-ith the inven~ion. Fi.st~ .he.o
i5 a detailed match phone machine ~hich incorporates
predefined probabilities into a phone rllodcl. The de-
tailed match phone machine is charac~erized D~' ~a) a
plurality of states and l_ansitions between the s~ates:
(b) a probability associated hith each transition, that
the particular transition occurs; and (c) a probabil'ty
that, at a given transition, a particular label is gen-
erated by the phone. ~;hile the detailed match phcne
7; machine can be used in determining ~:ith grea~ exactness
how closely a phone matches a string of incoming labels,
tremendous calculation demands attend the detailed match

YQYS3-0''9
-16- ~65~
phone machine. To overcome ~he calcula.ion demands as-
sociated with the cetailed match phone ma_nine, a seccnd
phone machine is considered by rhe invention. The secvnd
phone machine, hereaf.er refer-ed to as a. basic ~~ast
match phone machine, mai;cs an appro.~imation ~ha~ sim-
plifies calculations relative LO the detailed match
phone machine. Specifically~ wherever a given lahcl has
cL probability of occurring at any .r_n~i~ioD in a gi~-?.n
phone, tha~ probability is assigned â single specific
value --which preferably is at least as large as the
highest probabilit~ of the label occurring at an,~ tr~rl-
sition in the phone. In a further embodiment~ h~reaf-er
1~ referred to as the alternatLve f~st ma~ch, the proDa-
bility of a phone genera~ing ~ string o' labels of an~
of a number of lengths is considered unirorm 'oy the pnone
machine corresponding thereto. In the alternative fast
match phone machine, a minimum Length and a ma~i~um
~0 length are specified bet~-een ~-hich tn_re is defined a
length distribution. In the al~ernative fast malch Dhone
machine, the probability OL any length ~-ithin the leng~h
dis~ribution is repiaced by a single defined ~alue and
tr,e probabilit~; of any leng~h ou~side ~he disrribu~ion
is zero. The alternative fast malch phone machine can
be applied to the basic fast match phone machine to
further reduce the amount of calculation required in

'tO~S3-0~9
- 1, - ~.;~36S77
determinin~ whe~her a word defined b~- a string of ~n~ne
machines qualifies as 2 candidate t;ord.
In accordance t~ith the in~ention, a list OL- words is
e.Ytrac~ed from the vocabulary of~;ords by processing the
hords through either basic fast match phone machines or
through alternative fast match phone machines that
pre~er~bl~- includes both label probabiiity reD~aceme!lt
values and length distribution replâct-menl vâlues. .he
words in the ex~rac~ed list of words are thereafter
processed by detailed match ?hone machines to arrive at
a single word. In this regard, i, should be reali~ed c"at
a langua,ge model (not described hereinj is also pr~,er-
1~ ably provided to eliminate inzppro?riate words ~an-
guage models, such as models emplG~-ing tri-grams, h~ve
been discussed in the art. Briefk , a language model
emploved with the in~ention includes probabililies ror
respective three t~ord sequences occurring, such proba-
bilities being based on data deri--ed rrom a text. Spe-
ciricall~, the language moeel is im?lemented in PL/I on
an IB~I 4341 using an IB~i 33S0 dis~ drive ror the lang~ave
model s~orage. The IB~I 4341 is also connected to the L~S~
mâtch processor, the detailed match processor. and the
~5 rront end processor. In addition, the IB~' 4341 is also
connected to â work station implemented in, for example,
Pascal on an Apollo Domain computer to enable a user to

~0983 ~07~
36~77
-18-
enter input to the s stem. The wor~ station is connected
to the host IBM ~341 through an IB~ person31 computer
and a ~,Q'~ commur.icztions conlroller.
E~ch of the above-notec phone machines receives as ~n-
puts a start-lime dis~ribution and a string of labels
and determines therefrom a match value t~hich SUggeatS
the ll~elihood that a given phone produces a sequence
of labels in the string.
Each of ~he above-mentionea typGs of phone machines is
now described in greater detail.
B. DETAILD ~TCH PHONE ~IACHI~E
In Fig. 2, a sample detailed match phone machine ~00 is
1~ depicted. Each detailed match phone machine is a ?rob-
abilistic finite state machine cnarac~erized bv ~aj a
plurality of states Si, (b) a plurality of transitior.s
tr(SjlSi), some of the transitions extending between
differen~ states and some e~tending from a stale back
7 to itself, aach transieion having associated therewith
a corresoonding probabiliry, and (c) for each label that
can be generated at a Darticuiar transition~ a corre-
sponding actual label probability.

~0~o3-0~9
i77
-19-
In Fig. ~, seven states 51 th..ougn 5 are provided and
thirteen transitisns trl ~hrou~h trl~ are pro~lded in
; Ihe detailed match phone macnine ~OO. A review o' Fig.
'~ shows that phone machine '~OO has three transitions
with dashed line paths, namely transitions trll, trl~,
and trl3. At each of these three transitions, the Phone
can change Lrom one state to ano~her with3ut prod~cing
a label and such a transi,ion is, accordirgl~-~ referred
to as a null transition. Along transitior.s Lrl through
trlO labels can be produced. Speciri^all!,-, along each
transition trl through trlO, one or more labels may ha~-e
a distinct probability of being gener~.ed the.eat.
1~ Preferably~ fo- each transl.tion ~here is a ?robabil e;
associated ~ith e~ch label ,h-t c~n be ge~er2ted in the
system. Tha~ is, if there are ~wo hundred 'abels ~ha,
can be selectively generated by the acousric channel,
each transition ~that is not a null) has ~wo hundred
"O "actual label probabiliLies" associated rherewith --each
of which corresponds to the p-obability ~hat a corre-
spondinv label is generated b~- the phone a, the partic-
ular transition. The actual label probabilities for
transition trl are represented b~ tne symbol p followed
rr, b~- the bracketed column of numerais 1 through ~OO~ each
numeral representin~ a given label. rOr label 1~ ~here
is a probability pll¦ that the detailed pllone machine
"OO generates the label 1 at transition trl. The ~rarious

VQ98~-0~9
-20-
actual label pro;~abilities are stored with relacion to
the labei and a corres?onding Iransiticn.
S When ~ st.ing of labels YlY~Y~--- is presented to a de-
tailed match phone machine ~uO corresponding to a Oiven
phone, a match procedure is performed. The procedure
associated wilh the detailed malch phone machir.e is e.~;-
plained ~.~ith rere_ence to Fi~. 3.
.
i.C Figure 3 is trellis diagram of the phone machine oi Fig.
2. As in the phone machine, the Ireliis diagram shohs a
null transition from slate Sl to state S_ and ~ransi-
tions from state Sl to sta e S~ and ' om sla~e Sl ~o
state S4 The transitions be~i.;ecn othor s~ales are ~lso
1~ illustrzted. The trellis diaOram also sho~;s time meas-
ured in the horizon~al direction. Start-time probabili-
ties qO and ql represenl the probabilities that a pnone
has a start time at time t=to or t=tl, .espectivaly, for
the phone. At each start time tQ and 'l~ the various
transitions are shown. It should be noted, in this re-
gard, that the inrerval belweer. successive atart ~and
end) times is preferably equal in length ~o ~he time
inter~al of a label.
In employinO the detailed match phone machine ~00 to
determine how closely a given phone matches the labels
:.,.,,., .: :

Y09~ 9
57~7
-Zl-
of an incoming string, an end-time distribution for the
phone is sought and used in determining a match value
for the phone. rne notion of rel~-ing on the end-ti~.e
distribu-tion is common to all embodiments of phone ma-
chines according to the inven~ion. Tn genera~ing the
end-time distribution to perform a delailed match, the
delaiied match phone machine 200 involves computations
whicr. are exact and complicated.
Loo~in~ at the trellis diagram of ~ig. 3, we first con-
sider the computations required ~o have both a start
time and end time at time t=to. For this to be the case
accorting to the example phone machine structure sec
lS forth in Fig. 2, the following probaDilit-; aDpi~es:
Pr(S7~t=tO) = qO.YT(1-'7) ~ pr(s~r=to)~:r(~
Pr(S3,t=tO)xT(3~7) (3)
where Pr represents "probabilit~ of" and T represents
the transition probabilit~- bet-~een the two parenthet-
ically identified states. The above equation indica~es
the respective probabilities '~or the three conditions
under which the end time can occur at time t=to. 'lore-
over, it is obse.ved tha~ the end time a. ~=to is limited
in the current exampie to occurrence at state S7.
2S Loo~:ing next at the end time t=tl, it is noted that a
calculation relating to ever~ state other than state S

~98~-0~ 3~577
must be c,ade. The state Sl starts a~ -ne end time of the
previous phone. For ?urposes of e~planation, only the
calculations pertaining to state S, a.e set for~h.
~or state S4, the calculation is :
Pr(S4,t=tl)= Pr(Sl,t=tO)xT(1~~4)xPr(yl1~4)
Pr(s4~t=tO)~T(4~ Dr(vl4-~4) (4)
In words, the equation ~4) set .orth mme&iately ab3~-e
indicates that the ?robabilit; of the p!~one machine be-
ing in state S4 at -ime t=tl is de?endent on the sum of
the follor~ing two terms (a! the probabilitv of oein~ at
state Sl at time t=t~ multiplied Dy t~e probâL)ilit-. !-.;
of the transition from state Sl ro s~ate S c.ul~i~iied
l; further by the probability (Pr) of a Oiven label ---.--
in the string being generated gi~ren a transition rrom
state Sl to state S4 and (b) the probability of beir.g
at state S4 at time t=to ~ultiplied by the probabili~y
of the transition from state S, to itself and further
multiplied by the probability of generating the given
label --y-- during and gi~en the transition from state
S to itselE.
Simil2rly~ calculations pertaining to the other states
(excluding sta'e Sl) are also performed to generate
corresponding probabilities that the phone i5 at a par-

Y0983-029 ~236S ~7
ticular slate at time t=tl. Gensrall~-, in de~ermin_ng
the probabilit~ of bein~ at a subject state at a given
time, .he inven~isrn ~a) recogni7es each previollâ sta-e
that has a rransition which leads co the subject slate
anci the respec ive probability of each such previous
state; (b) recognizes, for each such pre~ious s~ate, a
value rep.esentirlg ~he prosâbilit~7 of the label that
must be genera~ed at the trar.sition berween each such
previous state and .he current state in order to con.orm
the label slring; and (c~ combines the probability
of each pre~ious slate and the resoecrive vdlue r~pres-
enting the label probaDilic~; to provids a subjecr. slare
1~ probability over a correspollaing transition, rhe overall
probability of beino at che subiec~ sr.a~e beinO deter-
mined fxom the subject stare probLbilities ove. all
transitions leading thereto. The calculation ror state
S7, it is noted, includes lerms relating to he three
O null transitions which permit the phorle to start and end
at time t=tl with the phone ending in state S7.
As with the probabilit~ dererminations relLrive to timss
t=to and t=tl, probabili~~- determi-lations for a series
of other end times are prefe-abl~ g~nera.ed IO form an
~~ end-time distribution. The value of ths end-time dis-
tribution for a given phone provides an indica-cion of
ho~ well the given phone marches the incoming labels.

Y09~3-0~9
3L;;~3~S77
",
In determining hot.~ woll a word m2tches a st~ing of in-
coming labels, the phcnes wnich represent the ~-ord ~re
processed in seauence. Each phone generates ar. end-time
distribution of probability v21ues. a match value ror
the phone is obtained b~T summing up the end-time prcba-
bilities and then taking the loOarithm of that sum. A
starl-time distribution for the ne~-.t phone is deri~ed
b~- normaii~ing Ihe end-time dis.ribution b~-~ for e~am-
ple, scaling each value thereof b~- di~-idinc each ~alue
by the s~lm so thaL the sum of scaled ~-alues totals one.
It should be reali ed that the present in~ ntion cou-
templates two rnethods of determinirlg h~ ~he num~er of
phorles to be e.~amined for a gi~en word or word s~rlng.
III a depth first method, compu~atiorl is made along a
baseform --compu~ing a running sublo~al with each SIIC-
cessive phone. ~hen the subtotal is found to be beiow a
predefined threshold for a given phone position
~0 therealong, the computation terminates. t~lternati~el~-,
in a breadth fi st method, a compu-ation for similar
phone positions in each word is made. The CCmDUtatiOns
following the iirsL phone n each word, the second phone
in each word, and so on are made. In the breadth first
2.~ method, the computaticns along the same number of phones
for the various words are compared at the same relati~e
phone positions Iherealong. In either method, the

'~09~3-079
- ~,
~;~;3~77
word(s) ha~7ing the l~rgest sum or match values is ~~e
sought obie Ct .
; As noted previousl~, a language moael which stor~s in-
rormation --such as tri-grams-- relating to words in
context ma~ be included to enhance the prob2Dilit~ of a
correct ~ord selection. Langua~,e modsls have bee~ re-
ported in the iiterature.
'0 ~le detailed match has been impiemenled in .~PAL ~;rray
Processor Assembl~j Language) which is the native assem-
bler for the Floating Poin~ S~-ste~.s, Inc. l90L. In his
regard, it should De recognized ~hat the ie ailed m.atch
reauires considera~le memor~ for stor_ng each o, the
actual label probabilities (i.e., the probabilit-j- th~t
a given phone generales a gi~-en label ~- at a given
transition); the transition probaDilities for eacn phone
maohine; and the probaDilities of a given phone being
at a given state at a gi~en ~ime after a defined start
time. '~e above-r.oted i90L is set up to ma~e the .arious
computations of elld times, match values based on ,for
example, a sum --prererabl~- the l-,garithm of the sum of
end time probabilities; start times based on the prsvi-
ousl~ generared end time probabilities; and ~orc match
2~ scores based on the match values for sequential phones
in a ~ord.

'~09&3-~ 7 ~
- 7 6- ~ ~3~57~
Because the detailed match is c~mputationally eY~pensive~
the present in~-ention includes a basic fest mz~ch and
o an aiternative fas~ match which reduces the compula ion
requirements withou~ sacrificing accuracy.
The fast ma~ch embodiments are employed to define a list
of on the order of tan ,o one hundred candida~e words
selected as the mos~ ely words in he vocabulary to
lG correspond to the incoming labels. The candida e words
are preferably subjected to the lan~ua~e model and to
the detailed match. By paring the number of ~-ords con-
sidered by the detailsd match to on the order of 1~ of
the words in the vocabulary, rns compu~ational cos~ is
'5 greatl~7 reduced while accuracy is main~~inea.
C. BASIC FAST MATCH
The basic fast match simplifies the detailed match by
replacing with a single value ~he actual label prooa-
bilities for a given label at all transitions a~ which
~0 the given label may be gener~ted in a given phone ma-
chine. That is, regardless of the transition in a given
phone machine ~-hereat a label has a probability of oc-
curring, the probabilily is replaced by a single spe-
cific value. The value is preferably an overestimate,
being at least as great as the largest probabllity of

~0~83-079 ~36577
.
the label occur-ing at any transition in ~he given phone
machine.
; By setting the l~bel probabilit~- replacement value 25
the maximum of the ac;ual label probabililiss fo- the
given label in the gi~en phone machine~ it is assured
that the match v~lue generated with the basic fast~.asci~
is at least as high as the match ~-aiue th;.t would .2s~1
from employing the detailed ma~ch. In ~hls way, the
basic fast match typic~lk- o~,-eres~imates th" matCil ~-21u~
of each phone so that more words are generall selected
as candidate'words. ~ords considered candisi~es acco c-
ing tG the dstailed match also pass muste~ in accordanc~
with the basic fast match.
Referring to Fig. 4, a phone machine 40G for the basic
fast match is illustrated. Labels ~aiso referred to as
symbols and fenemes) enter the basic fast match ?hone
machine 400 together ~ith a start-time distribution. Ths
~0 start-time distribution and the label string inpu~ is
like that entering the detailed match pllone machine de-
scribed hereinabo~-e. I should be realized chat the
start time may, on occasion, not be a distribution ~ver
a piurality of tlmes but may, instead, represent a pre-
~; cise time --for example following an interval of si-
lence-- at which the phone begins. ~ihen speech is

Y0983-0~9
6~i~77
soncinuous, however, Ihe end-time dist ibution is used
to define the start-cime dis~ribution (as is discussed
o in greater delail hereinbelow). The phone macr.ine 400
generates an end-time distribution and a match v~luo for
the particular pnone from the generaced end-time dis-
tribution. The match score for a ~ord is defined as the
sum of match v~lues for co3ponent phones --~t least tne
rirst h phones in the ~;ord.
Referring no~ to Fig. ~, z diagram of a basic fast match
CompUtaliOn is illuscrated. The basic fast match com-
putaeion is only concerned ~ h the start-lime di3trib-
ution, the number --or length Ot- labcls-- prcduced by
lo the phone, and the replacement values D~ associat-d
with each label Yk. By subst tuting all aclual label
probabilities for a given label in a given phone machine
by a correspondlng replacement value~ the basic fast
match replaces transition prob bilities with length
distribution probabilities and obviat s the need for
including actual label probabilities ~hhich can difrer
for each transition in a given phone macnine) and prob-
abil ties of being at a given scate at a given ti~e.
In this regard, the length distributions are determined
2~ from the detailed match model. Specifically, for each
length in the length distribution, the invention pref-

Y0~3-0~9
3~577
erably examirles each slate individuall. and determines
for each state the various ~ransition paths b~; ~nich the
currentlv ex2mined state can occur ~a? glven a partic-
ular label length and (b) recardlsss of the out?uts
along the transitions. The probabilities for all tran-
sition paths of the particular length to each subject
s,are are summed and the sums for ~11 the subject states
are ,hen adaed to indicate the probaDilit~ of a Oi--en
len.gth in the distribution. The above procedu-e is re-
peated for each length. In accordance ~i~h the pre~erred
form of Ihe invention, these computations are made with
reference tC a trellis diagram ?S is known in ~he ~rt
1~ of ~arko~ modellinc. For transi~ion ?aths w-hich share
branches along the trellis s~ructure, the compll~aticn
for each com~on branch need be maae onl)7 once and is
applied to each path that includes the common branch.
In the diagram oE Fig. ~, th-o limita~ ons are inciuded
bv hay of example. First, it is assumed that the length
of labels produced b~- the phone can be zero, one, tho,
or three having respective probabilities of lo~
and 13. The start time is also li~.ited, permi .ir.g onl~;
four start times having respective probabilities OI q0
7~ ql, q7, and q3. With these limitations, the following
equations define the end-time distribution of a subject
phone as:

~0983-Q~9
~23~577
-30-
~o = qOlo
~1 = qllo + qO1121
_ ~ ~ q A 1 0 q 1 1 2
q~ 3 ql-~P P3 qO 3-1P~P3
q~llP4+ q212P3P4+ql 13P2P3 P4
~;=q3l2pip5+ q2l3p3p4p;
~= q313p4~
In sxâmining the equa ions, it is ooser~-ea ~hat ~3 in-
cludes a ~erm correspondinO to ~ach or rour start times.
The first term represents the probabiii~y that the phone
starts at time t=t3 And produces a length of zero l~Dels
--the phone slartin.~ ~nd ending a~ the same time. ~le
second ter~ represen~s the prob~bili y ~hat ~he pnone
starts at time t=~2, that the length or labels is one,
and that a label 3 is produced by the phone. The third
term represents the prob~bilit~; that the phone star~s
at time t=tl, that the len~th Oc labels is two ~nar..ely
labels 2 and 3), and that labels 2 and 3 are produced
by the phone. Similarlyj the fourth ~erm represents the
probability that the phone starts at time t=to; tha~ the
lenOth of labels is three; and that the three labels 1
2, and 3 are producsd by the phone.
7~ Comparing the computations recuired in the basic fast
match with those required by the detailed match su~gest

Y~3-0~ 5~7
-31-
the relative simplicit~ of the former relative to the
latter. In this regard, it s noted that the p'y ~alue
remains the s2me for each a?pearance in all ~he
equations as do the label length probabilities. `lore-
over~ with Ihe length and start time~limitations~ .he
computations for the later end times become simpler. For
e.~ample, at ~6~ the phons must start at time t=t and
lQ all three labels 4, ;, and 6 mUsl be produced o~- che
phone for that end time to ap?ly.
In generating a match value for a su~joct p~one~ the end
time prob~bilities along the deiined ond-time dislrib-
ulion are summed. If desired, the iog of t~e sllm is ta~n~n
:1~ to provide the e.YprCSSiOn:
match value - logi0(30 l --- +
As noted previously, a match score for a word is readil~
determined by summlng the malch values for successi~e
phones in a particular ~-ord.
.
: 20 In describing she genera ing of the star~ time distri~-
ution, reierence is made to Fig. ~. In Fig. 6(a), the
word THE1 is repeated, bro~en down into its compone.nt
: phones.~In Fig. 6~b), the string of labels is d~epicted
over time. In Fig. 6(c), a first start-time distribution
2; is shown. The first start-time distribution has been
.
1.

Y0')83-0~9 ~36~7~
3~
derived rrom the end-time dist~ibution of the most re-
cen. previous phone (in the previous word which mav in-
o clude a "hord" of silence). ~ased on the label inputs
and the start-time distribution of Fig. o(c', the end-
time distribu~ion for the phone Dh~ ~DH~ is generaled.
The start-time distribulion for the ne~t phone, ~H, is
determined by recognizing the time during which the
prs~.~ious pnone enc-time dis~.ibution exceeded a thresn-
old (~) in Fig. o(d). ~A) is de~ermine~ inditiGually for
each end-time distribu~ion. Preferabl~-~ (h) is a fur.c-
tion of the sum of the end-time distribution values for
a subject phone. The inter~-al bstween ~imes a and b ~hlls
1~ reprssents ~he time during which ~hs ster~-ti:ne dis-
tribution for the pnone ~'H is ss~. (See Fig. ote).) The
interval between times c and d in F~g. o~sJ corresponas
to the times between which t~.e end-~ime distribution for
the phone DH e~ceeds the threshold (A) and bet-weer which
the start-time distribution of the ne~L phone is set.
The values of the start-time dis~ribution are obtained
bv normalizing the end-time dis~ribution by, for e.~am-
ple, dividing each end-time value b~ tlle sum OI the
end-~ime values wh ch ex~eed the threshoid (A).
2~ The basic fast match phone machine 400 has been imple-
mented in a Floating Point Systems Inc. 190L with an
~.PAL program. Other hardware and softhare mav also be

o ~ ~
~LZ~S~
-33-
used to develop a specific form of Ihe invention by
following the teachings se-~ forth harein.
C. ALTER`;ATI~E FAST `IiTCH
The basic fast match employed alone or, preferably, in
conjuncrion -~;ith the detailed match and/or a lar.gua,e
model greatlv reduces computation requirements. Io -u.-
ther reduce computation21 requiremen,s, the pr~sent ir-
vention iurther simplif-es the detailed match bv
defining a uniform label length d stribution bet~een t~-o
lengths --a minimum length Lmin and ~ maximum len~t;
L . In the basic fas~ match, the ?ror;abilities of a
max
phone generating labsls cf a given leng~h --namely i~,
1~ 11, 12, etc.-- typically hav2 differing valuss. ~ccord-
ing to the alternative fast match, ~he probabilitv for
each length of labels is replaced by a single uniform
value.
Preferably, the minimum leng h is equal to the smallest
~0 length having 2 non7ero probabilit)~ in the original
length distribution, although other lengths may be se-
lected if desired. The selection oi the maximum length
is more arbitrary than the selection of the minimum
length, but is significant in that the probability of
lengths iess than the minimum and greater than the max-

~&3-0~9
7~
imum ~re set as zero. B~ defining the length r)robabilitv
to e.~ist between only th~ mir.imum ler.g~h and the;na~:imum
lenath, a uniform pseudo-dis,ribution c&n be set fo:-Lh.
In one approach, t~le uni'orm p-obabil~t~- can be set 2S
the average probabilit~ o~er the pseudv-distribution.
Alternatively, the unirorm probabilit~ can be set as the
ma~imum o. the lenOtr. probabili~ies that are replaced
bi7 the uniform ~-alue.
The effec~ of characLer,~ing all the l~bel ier~.gth ~rsv-
abilities as equal is resdil~- obse~vcd hith reference
to the equations se- forth above for the en~-~ime cis-
trib-ltion in the bssic f2st ma~ch. Specificali~ .he
1~ length probabilities can be fac~ored OUt ?S a consta!~t.
ith L i being set at zero and all leng~h ?robabilities
being replaced by a single constznt value, the end-time
distribul;ion can be characterized as:
= ~ /1 = q + ~ P
m ` m m m-l m
wllere "1" is the single uniform replacement ~alue and
where the ~alue for p corresponds prererably ~o the
replacement value for a ~iven label being gener~ted in
the given phone at time m.
~or the above equation for ~ , the malch value is defined
2~ as :

Y0983-0~9
3~ ~3~
match ~-alue = log10(~0 + ~1 + ~~ + ~m) glû ~
In comparirg the basic fast match and the alterna~i~e
fast match, it has beerl found that -he number of requi-ed
additions and multipli-ations are greatl~- reauced by
employing the alternative fas~ match phone machines.
~Tith L = û, it has been found that the basic fast
mln
malch requires fort~ multipllcaLions and ~t.~ent~ addi-
lû tions in that the leng~h probabilities must be consid-
ered. ~t~ith the alternative fast match, ~ is determined
recursively and re~uires one multiplication and one a~-
dition for each successi~Te ~ .
m
To further illuât~a~s ho~ ~he al;hrnati~e fas- m~tch
l; simplifies compu~ations, Figures , and ~ are provided.
In Fig. 7(a), a phone machine emDodimen~ 7ûO corre-
sponding to a minimum leng~h L i =û is aepic~ed. The
maximum length is assumed to be infir.i e so that ~he
length distribution may be characterized as uniform.
In Fig. 7(b), the trellis diagram resulLing -rom the
phone machine 700 is sho~n. Assuming -h3t stl,t times
af~er q are outside the start-time distribution, all
det~rminations of each successive ~ ~;ith m-n require
one addition and one multiplication. For determinations
2~ of end times thereafter, there is only one required
multiplication and no additions.

~09O3-0~9
-36- ~23~577
In Fig. 8, L i = '~ Fig. ô(a) illustrates a specific
embodiment of a phor.e machine 800 therefor and Fig. S(b
shows a corresponding trellis diagram. Because L i ='
the trellis diagra~ of Fig. 8~b) has a zero probabili ;
aiong the paths mar~ed u, ~r~ h'~ and 7. For those end
times which e~tend betheen ~4 and ~ , it is noced that
four multiplications and one addition is required. ror
end times ~reater than n+4, ons multip]ic~tion ~nd no
additions are required. This embodim nt has been imDle-
menled in AP.~L code on a FPS l~OL.
It should be noted tha~ additionai s;ates may be adaed
to the Fi~. / or Fig. o embodimAnts as c'esired in ~c-
1; cordance with the in~ention. For s~:amDlA, ar.y number
of states with null ~rarsitions ma~ be included witr~out
altering the value of L .
mln
D. IIATCHI~'G BASED 0~ FIRST J LABELS
As a iurther refinemenl to the basic fast matcn and al-
ternati~e fast match, the p~esent invention contemplates
that only the first J labels or a string which enters
a phoIIe machine be considered in the match. Assuming
that labels are produced by the acoustic processor of
an acoustic channel at the rate of one per centisecond,
~5 a reasonable ~-alue for J is one hundred. In other ~-ords,

iO'~o3-0~9
,7- ~z3~77
labels corresponding to on the order of one second of
soeech ~-ill be proiided to determine a match be.tween a
phone and the labsls entering the phone machine. B~
limiting the number of labels e.iamined, two advantages
are realized. FirsL~ decoding dela~ s reduced and,
second, problems in comparing the scores of short words
~ith long words are substantially a~oided. ~le len~th
o~ J can~ of course~ be variea as desired.
The effect of limiling the mlmber Gf labels e~amined can
be noted with reference to the trellis dia~ram of r ~.
8~b). Without the presen. ref~nement, the f2s~ match
sco~e i5 the sum o. the p-obabilities OI ~. ~S a]on~ ~he
1; bottom row of the dia~ram. That is. the ~robabi' ir, of
being at state S4 at each time sLarting a~ ~=to (for
Lmin=O) or t=t4 (for Lmin=4) is determined as a a and
all ~ '5 are then totalled. For Lmin=4, ~here is no
probability of being in state 54 at an~ ~ime bei'ore ~4.
~O With the refinemen~, the summing of a ~5 termiIlates at
time J. In Fig. 8(b), time J corresponds to time t +,.
,
Termina~ing the e~amination of J labels over J time in-
` :
tervals can result ln the following two probabllit~
summations in determining a match score. First, as de-
2; scribed hereinbefore, there is a row calculation along
the bottom row of the trellis diagram but only up to the
..

~G983-0~9
~:~3~77
, ~
time J-l. The probabilities of being in state S4 at each
time up to time J-l are summec to form a row score.
; Second, there is a column score which corresponds tO the
sum of probabilities that the phone is at each respec-
tive state S0 through S4 at time J. That is, the column
score is: 4
column score = ~ Pr~Sf,J)
f=0
The match score for a ?hone is obtained by summin~ the
roh- score and column score and then ~a~ing ~he logarithm
of that sum. To contlnue the fast m~tch for the next
phone, the values along the bottom row --prefer~bi~- in-
cluding time J-- are used to derive the next ?hone
l; start-time dislribution.
After determining a match score for each of b consec-
utive phones, the total for all phones is, as before
noted, the sum of the match scores for all the pnones.
In eY.amining the manner in which the end-time prob~bil-
ities are generated in the basic fast match and alter-
native fast match embodiments set forth abcve~ it is
noted that the determination of column scores does not
conform readily to the fast match computations. To bet-
ter adapt the refinement of limiting the number of la-
bels examined to the fast match and alternative match,

'~0~&3-0.~9
39 ~23~577
the present inven~iGn provides tnat ths column ssore be
replaced by an additional ro-~ score. That i5, an adci-
tional rot.~ score is determined ror the phone being at
state S, (in Fi~ure 8(bj) bet~;een t1mes J and Jt~ -~-nere
~ is the maximum number of states in~any phorle machine
Hence, if any phons machine has ten states, the present
refinement adds ten end times alonO the bo~tom ro~i of
the trellis for each of ~hich a probabiliby is deber-
mined. All the probabilities along the bott3m ro~ up to
and including Ihs prob~bility .?t ~ime J.~ are added to
produce a ma~ch score for the given phone. ~s before,
consecutive phone match values are summed to provide a
1~ : word match score.
This embodiment has been implementcd in ~?.~L cGde on a
FPS l90L; however as with other portions of the in-
vention may be implemented hith othe~. codcs on other
systems.
~: :
0 E. THE TREE STR~CT'~RE L~D rAST M.~TCH EM80DIMENTS
.
By employing the baalc fast match or al~ernative fas~
match --hith or without the maximum label limitatiorl--
the computatlonal time reauired in determining phone
match values is tremendously reduced. In addition, the
computational savings remain high even when the detailed
.

yo983-0~9 ~36~7~
-40-
march is performed on ~r.e-~ords in rhe fast match deri.ed
list.
Tlle phone match values~ once determined, are compared
alon~ the branches of the ~ree structure to determine
which paths o' phones are most probable. In Fig. 1, the
phone match values for DH and ~Hl should sum to a much
higher value for the spo~en ~ord "the" th1n the various
sequences of phones brancning from the phone ?1.~:. In th s
regard, it should be o~served that the ?hone match ~alue
of ~he 'irst ~1~ phone is computed only once and is ~hen
used for each baseform e;~tendir.g thererrom. In addition,
when the total score calculatetl alon~ a first seauence
l; of branches is -ound to be much lo~;er th~n a ~hrcsholc
value or much lot~er than the total score tor other se-
auences of branches, all b seforms e~tending rrom the
first sequence Day be simultaneously e~limina.ed as can-
didate words.
~0 With the fast match embodimGnts and the tree structure,
an ordered list of candidate hords is generated t~ h
greaL computat~onal sa~ings.
F. F~RTHER E~30DI!IE~TS

~ ~Q~ 3657~
Thus, r~hile the in--ention has been described ~ith ef-
erence to preferred embodiments thereof, it ~iill be un-
; derstood b~T those sl.~illed in the art .hat ~arious
changes in form and details may be made withou~ de?ar -
ing from the scope of the in~ention. ~or e.~;amDle, al-
though it is preferred that each replacement ~-alue be
no smaller than the ma.~imum actual prob~bilit~; it re-
places, ot;~er methods ma~- also be emplo~;ed to acilier~e
the desired ob~ec~ of realizing an overeslimatea ma~ch
~-alue at least mos~ OL the t me.

Dessin représentatif

Désolé, le dessin représentatif concernant le document de brevet no 1236577 est introuvable.

États administratifs

2024-08-01 : Dans le cadre de la transition vers les Brevets de nouvelle génération (BNG), la base de données sur les brevets canadiens (BDBC) contient désormais un Historique d'événement plus détaillé, qui reproduit le Journal des événements de notre nouvelle solution interne.

Veuillez noter que les événements débutant par « Inactive : » se réfèrent à des événements qui ne sont plus utilisés dans notre nouvelle solution interne.

Pour une meilleure compréhension de l'état de la demande ou brevet qui figure sur cette page, la rubrique Mise en garde , et les descriptions de Brevet , Historique d'événement , Taxes périodiques et Historique des paiements devraient être consultées.

Historique d'événement

Description	Date
Inactive : CIB désactivée	2016-01-16
Inactive : CIB attribuée	2015-11-26
Inactive : CIB attribuée	2015-11-26
Inactive : CIB attribuée	2015-11-26
Inactive : CIB en 1re position	2015-11-26
Accordé par délivrance	1988-05-10
Inactive : Périmé (brevet sous l'ancienne loi) date de péremption possible la plus tardive	1985-11-06

Historique d'abandonnement

Il n'y a pas d'historique d'abandonnement

Titulaires au dossier

Les titulaires actuels et antérieures au dossier sont affichés en ordre alphabétique.

Titulaires actuels au dossier
INTERNATIONAL BUSINESS MACHINES CORPORATION

Titulaires antérieures au dossier
LALIT R. BAHL
ROBERT L. MERCER
STEVEN V. DEGENNARO

Les propriétaires antérieurs qui ne figurent pas dans la liste des « Propriétaires au dossier » apparaîtront dans d'autres documents au dossier.

Documents

Pour visionner les fichiers sélectionnés, entrer le code reCAPTCHA :

Pour visualiser une image, cliquer sur un lien dans la colonne description du document. Pour télécharger l'image (les images), cliquer l'une ou plusieurs cases à cocher dans la première colonne et ensuite cliquer sur le bouton "Télécharger sélection en format PDF (archive Zip)" ou le bouton "Télécharger sélection (en un fichier PDF fusionné)".

Liste des documents de brevet publiés et non publiés sur la BDBC .

Si vous avez des difficultés à accéder au contenu, veuillez communiquer avec le Centre de services à la clientèle au 1-866-997-1936, ou envoyer un courriel au Centre de service à la clientèle de l'OPIC.

Filtre

Télécharger sélection en format PDF (archive Zip)

Télécharger sélection (en un fichier PDF fusionné)

Description du Document	Date (aaaa-mm-jj)	Nombre de pages	Taille de l'image (Ko)
Revendications	1993-09-29	20	380
Page couverture	1993-09-29	1	15
Abrégé	1993-09-29	1	25
Dessins	1993-09-29	6	73
Description	1993-09-29	41	972

Sélection de la langue

Menus

Abrégé anglais

Historique d'événement

Historique d'abandonnement

Votre demande est en traitement.

Les informations demandèes seront
accessibles dans quelques instants.

Merci de patienter.

Sommaire du brevet 1236577

Abrégé anglais

Historique d'événement

Historique d'abandonnement

Votre demande est en traitement.Les informations demandèes serontaccessibles dans quelques instants.Merci de patienter.

Votre demande est en traitement.

Les informations demandèes seront
accessibles dans quelques instants.

Merci de patienter.