Language selection

Search

Patent 1246229 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 1246229
(21) Application Number: 1246229
(54) English Title: APPARATUS AND METHOD FOR PRODUCING A LIST OF LIKELY CANDIDATE WORDS CORRESPONDING TO A SPOKEN INPUT
(54) French Title: APPAREIL ET METHODE POUR PRODUIRE UNE LISTE DE MOTS PROBABLES CORRESPONDANT A UN SIGNAL VOCAL
Status: Term Expired - Post Grant
Bibliographic Data
(51) International Patent Classification (IPC):
  • G6F 1/00 (2006.01)
(72) Inventors :
  • BAHL, LALIT R. (United States of America)
  • DESOUZA, PETER V. (United States of America)
  • MERCER, ROBERT L. (United States of America)
(73) Owners :
  • INTERNATIONAL BUSINESS MACHINES CORPORATION
(71) Applicants :
  • INTERNATIONAL BUSINESS MACHINES CORPORATION (United States of America)
(74) Agent:
(74) Associate agent:
(45) Issued: 1988-12-06
(22) Filed Date: 1986-03-24
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data:
Application No. Country/Territory Date
738,930 (United States of America) 1985-05-29

Abstracts

English Abstract


APPARATUS AND METHOD FOR PRODUCING A LIST OF
LIKELY CANDIDATE WORDS CORRESPONDING TO A SPOKEN INPUT
ABSTRACT
A speech recognition apparatus and method of selecting
likely word from a vocabulary of words, wherein each
word is represented by a sequence of at least one prob-
abilistic finite state phone machine and wherein an
acoustic processor generates acoustic labels in response
to a spoken input, include: (a) forming a first table
in which each label in the alphabet provides a vote for
each word in the vocabulary, each label vote for a sub-
ject word indicating the likelihood of the subject word
producing the label providing the vote; (b) forming a
second table in which each label is assigned a penalty
for each word in the vocabulary, the penalty assigned
to a given label for a given word being indicative of
the likelihood of the given label not being produced
according to the model for the given word; and (c) for
a given string of labels, determining the likelihood of
a particular word which includes the step of combining
the votes of all labels in the string for the particular
word together with the penalties of all labels not in
the string for the particular word.


Claims

Note: Claims are shown in the official language in which they were submitted.


The embodiments of the invention in which an exclusive
property or privilege is claimed are defined as follows:
1. In a speech recognition system, a method of measuring
the likelihood of a word corresponding to a spoken input
where the word is from a vocabulary of words, the method
comprising the steps of:
(a) generating a string of labels in response to a spoken
input, each label (i) being from an alphabet of labels
and (ii) representing a respective sound type;
(b) determining label votes, each label vote represent-
ing the likelihood that a respective label is produced
when a given word is uttered; and
(c) for a subject word, accumulating a label vote for
each of at least some of the labels generated in the
string;
the accumulated label votes providing information in-
dicative of the likelihood of the subject word.
2. The method of Claim 1 comprising the further step of:
(d) combining the accumulated label votes together to
provide a likelihood score for the subject word.
3. The method of Claim 2 comprising the further step of:
(e) repeating steps (c) and (d) for each word in the
vocabulary; and
92

Claim 3 (continued)
(f) selecting the n words having the highest likelihood
scores as candidate words, where n is a predefined in-
teger.
4. The method of Claim 2 wherein the vote determining
step includes the step of (g) evaluating the log proba-
bility of the subject word producing the respective la-
bel; and
wherein the combining step includes the step of (h)
summing up the determined label votes for the subject
word ; and
wherein the method comprises the further step of:
(j) determining a scaled likelihood score for the sub-
ject word as the sum of votes divided by the number of
generated labels that are summed.
5. The method of Claim 4 comprising the further steps
of:
(k) repeating steps (c), (d), (g), (h), and (j) for each
word in the vocabulary; and
(1) selecting as candidate words the n words having the
highest scaled likelihood scores where n is a predefined
integer.
6. The method of Claim 1 comprising the further step of:
93

Claim 6 (continued)
(m) determining, for a subject word, a penalty for each
label wherein the penalty for a given label represents
the likelihood of the subject word not producing the
given label; and
(n) combining together (i) the label votes and (ii) the
penalties corresponding to the subject word, wherein
each combined label vote corresponds to a label gener-
ated in response to the spoken input and wherein each
combined penalty corresponds to a label not generated
in response to the spoken input.
7. The method of Claim 6 comprising the further steps
of:
(o) repeating steps (a), (b), (c), (m), and (n) for each
word as the subject word.
8. The method of Claim 7 comprising the further step of:
(p) determining the label in the string that corresponds
to the most likely end time of a word-representing ut-
terance; and
(q) limiting the determining of votes and penalties to
labels generated prior to the determined end time label.
9. The method of Claim 7 comprising the further step of:
(r) setting a reference end time; and
94

(s) repeating the combining step (n) at successive in-
tervals following the set reference end time.
10. The method of Claim 1 wherein the vote determining
step (b) includes the steps of:
(t) forming each word as a sequence of models each model
being represented as a Markov model phone machine char-
acterized as having (i) a plurality of states, (ii) a
plurality of transitions extending from a state to a
state, (iii) first means for storing a likelihood count
for the occurrence of each transition, and (iv) second
means for storing a likelihood count for each of at least
some of the labels being produced at each of at least
some of the transitions;
(u) determining transition likelihood counts and label
likelihood counts from training data generated by the
utterance of known speech input; and
(v) producing an expected label distribution for each
word based on the determined transition likelihood
counts and label likelihood counts, each label vote be-
ing derived from the expected label distribution.
11. In a speech recognition system having (i) an acous-
tic processor which generates acoustic labels, (ii) a
vocabulary of words each of which is represented by a
word model comprising a sequence of Markov model phone
machines, and (iii) a set of trained statistics indi-

Claim 11 (continued)
cating the label output probabilities and transition
probabilities of each phone machine in a word model, a
method of selecting likely candidate words from the vo-
cabulary comprising the steps of:
(a) for a subject word from the vocabulary, determining
a respective label vote for each label wherein a label
vote for a given label represents the likelihood of the
subject word producing the given label; and
(b) for a given string of labels generated by the
acoustic processor in response to an unknown spoken in-
put, combining the label votes for the subject word for
labels generated in the string.
12. The method of Claim 11 wherein the vote determining
step includes the steps of:
(c) computing the expected number and distribution of
labels for the subject word from the trained statistics;
and
(d) computing the logarithmic probability distribution
of labels for the subject word from the expected dis-
tribution of labels;
the logarithmic probability of the subject word produc-
ing a given label being the label vote of the given label
for the subject word.
96

13. The method of Claim 12 comprising the further steps
of:
(e) determining the likelihood of each label not being
produced by the subject word as a label penalty for the
subject word;
(f) applying a length of the label string to the subject
word;
(g) for each label not occurring in the applied length,
extracting the label penalty for the subject word; and
(h) combining the votes for all labels occurring in the
applied length with the penalties for all labels not
occurring in the applied length to form a likelihood
score for the subject word.
14. The method of Claim 13 comprising the further step
of:
(j) dividing the likelihood score for the subject word
by the number of labels in the applied length to provide
A scaled likelihood score.
15. The method of Claim 14 comprising the further step
of:
(k) repeating steps (a) through (h) for each word in the
vocabulary as the subject word; and
97

Claim 15 (continued)
(l) selecting as candidate words those words having the
n highest scaled likelihood scores.
16. The method of Claim 15 wherein the applying step
includes the step of:
(m) setting a reference end time;
(n) repeating the combining step (h) at successive in-
tervals following the set reference end time; and
(o) determining, for each word at each successive in-
terval, a scaled likelihood score relative to the scaled
likelihood scores of the other words at a given interval
and assigning to each subject word, the highest scaled
relative likelihood score thereof.
17. The method of Claim 16 comprising the further step
of:
(p) performing an approximate acoustic match on all
words in a set of words from the vocabulary determined
from step (l).
18. The method of Claim 17 wherein each phone machine
has associated therewith (i) at least one transition and
(ii) actual label output probabilities, each actual la-
bel probability representing the probability that a
specific label is generated at a given transition in the
98

Claim 18 (continued)
phone machine, and wherein the acoustic match performing
step includes the steps of:
(aa) forming simplified phone machines which includes
the step of replacing by a single specific value the
actual label probabilities for a given label at all
transitions at which the given label may be generated
in a particular phone machine; and
(bb) determining the probability of a phone generating
the labels in the string based on the simplified phone
machine corresponding thereto.
19. A method as in Claim 18 wherein each sequence of
phones corresponding to a vocabulary word represents a
phonetic baseform, the method including the further step
of:
(cc) arranging the baseforms into a tree structure in
which baseforms share a common branch for as long the
baseforms have at least similar phonetic beginnings,
each leaf of the tree structure corresponding to a com-
plete baseform.
20. A method as in Claim 18 wherein the specific value
assigned to all label probabilities for a particular
label in a given phone machine is equal to at least the
maximum actual label probability over all transitions
for said particular label in the given phone machine.
99

21. A method as in Claim 20 wherein each phone machine
also has associated therewith (c) a length distribution
for each phone which indicates probability as a function
of the number of labels produced by a particular phone,
the method including the further step of:
(dd) converting the length distribution for each phone
into a uniform probability length pseudo-distribution
which includes the step of ascribing a uniform value to
the probability of a given phone producing any number
of labels defined in the length distribution.
22. A method as in Claim 21 comprising the further steps
of:
(ee) finding the minimum length of labels having a non-
zero probability; and
(ff) confining the uniform -pseudo-distribution to
lengths at least as long as the minimum length, lengths
less than the minimum length being assigned a probabil-
ity of zero.
23. The method of Claim 12 wherein the combining step
includes the steps of:
(gg) summing the votes of labels in the string for the
subject word.
24. A speech recognition method of selecting likely
words from a vocabulary of words wherein each word is
100

Claim 24 (continued)
represented by a sequence of at least one probabilistic
finite state phone machine and wherein an acoustic
processor generates acoustic labels in response to a
spoken input, the method comprising the steps of:
(a) forming a first table in which each label in the
alphabet provides a vote for each word in the vocabu-
lary, each label vote for a subject word indicating the
likelihood of the subject word producing the label pro-
viding the vote.
25. The method of claim 24 comprising the further
steps of:
(b) forming a second table in which each label is as-
signed a penalty for each word in the vocabulary, the
penalty assigned to a given label for a given word being
indicative of the likelihood of the given label not be-
ing produced according to the model for the given word.
26. The method of Claim 24 comprising the further step
of:
(c) for a given string of labels, determining the like-
lihood of a particular word which includes the step of
combining the votes of all labels in the string for the
particular word.
27. The method of Claim 25 comprising the further step
of:
101

Claim 27 (continued)
(d) for a given string of labels, determining the like-
lihood of a particular word which includes the step of
combining the votes of all labels in the string for the
particular word together with the penalties of all la-
bels not in the string for the particular word.
28. The method of Claim 27 comprising the further step
of:
(e) repeating steps (a), (b), and (c) for all words as
the particular word in order to provide a likelihood
score for each word.
29. A speech recognition apparatus for selecting likely
words from a vocabulary of words wherein each word is
represented by a sequence of at least one probabilistic
finite state phone machine and wherein an acoustic
processor generates acoustic labels in response to a
spoken input, the apparatus comprising:
(a) means for forming a first table in which each label
in the alphabet provides a vote for each word in the
vocabulary, each label vote for a subject word indicat-
ing the likelihood of the subject word producing the
label providing the vote:
(b) means for forming a second table in which each label
is assigned a penalty for each word in the vocabulary,
the penalty assigned to a given label for a given word
being indicative of the likelihood of the given label
102

Claim 29 (continued)
not being produced according to the model for the given
word; and
(c) means for determining, for a given string of labels,
the likelihood of a particular word which includes means
for combining the votes of all labels in the string for
the particular word together with the penalties of all
labels not in the string for the particular word.
103

Description

Note: Descriptions are shown in the official language in which they were submitted.


24~2~
YOg-84-021 2
APPARATUS AND METHOD FOR PRODUCING A
LIST OF LIKELY CANDIDATE WORDS CORRESPONDING TO A
SPOKEN UNPUT
FIELD OF THE INVENTION
The present invention relates generally to the art
of speech recognition and specifically to the art of
forming a short list of likely words selected from a
vocabulary of words.
DESCRIPTION OF PRIOR AND CONTEMPORANEOUS ART
The following cases relate to inventions which
provide background or environment for the present
invention: Canadian Patent Application No. 482,183,
entitled "Nonlinear Signal Processing In A Speech
Recognition System", filed May 23, 1985; Canadian
Patent Applciation No. 494,697, entitled "Apparatus And
Method For Performing Acoustic Matching", filed
November 6, 1985; and Canadian Patent Application
No. 496,161, entitled "Feneme Based Markov Models For
Words", filed November 26, 1985; all of which
applications have been assigned to International
Business Machines Corporation.
In a probabilistic approach to speech recognition,
an acoustic waveform is initially transformed into a
string of labels, or fenemes, by an acoustic processor.
The labels, each of which identifies a sound type, are
selected from an alphabet of typically approximately
200 different labels. The generating oE such labels
has been

-
~IL2~6~
YQ984-021
-3-
discussed in various articles such as "Continuous Speech
Recognition by Statistical Methods", Proceedir.gs f the
IEEE, volume 64, pp. 532-556 (1976) and in the co-
pending patent application entitled "Nonlinear Signal
Processing in a Speech Recognition System".
In employing the labels to achieve speech recognition,
Markov model phone machines (also referred to as a
probable finite state mfichines) have been discussed. A
Markov model normally includes a plurality of states and
transitions between the states. In addition, the Markov
modal normally has probabilities assigned thereto re-
lating to (a) the probability of each transition occur-
ring and (b) the respective probability of producing
each lhbel at various tr~lsitions. The Markov model, or
Markov source, has been described in various articla~
such as "A Maxi~um Likelihood Approach to Continuous
Speech Recognition", IEEE Transactions on Pattern Anal-
ysis and ~achine Intelligence, volume PAMI-5, Number 2,
March 1983, by L.R. Bahl, F. Jelinek, and ~.L. Mercer.
In recognizing speech, a matching process is performed
to detarmine which word or words in the vocabulary has
the highest likelihood given the string of labels gsn-
erated by the acoustic processor. One such matching
procedure is set forth in *he co-pending apFlication
entitlad "Apparatus and Method for Performing Acoustic
Matching". As set forth thereLn, acoustic matching is
performed by (a) characterizing each word in a vocabu-
lary by a sequence of Markov model phone machines and
(b) determining the respective likelihood of each word-

62~
Y0984-021
-4-
representing sequence of phone machines producing the
string of labels generated by the acoustic processor.
S Each word-representing sequence of phone machines is
referred to as a word baseform.
As discussed in the application entitled "Apparatus and
; Method for Performing Acoustic Matching", the word
baseform may be constructed of phonetic phone machines.
3 In this instance, each phone machine preferably corre-
sponds to a phonetic sound and includes seven states and
thirteen transitions.
Alternatively, each word baseform may be formed as a
sequence of fenemic phone machines. A fenemic phone ma-
chine is a simpler Markov model than the phonetic phone
machine and is preferably constructed with two states.
Between the first state and the second state is a null
transition in which no label can be produced. Also be-
tween the first state and the second state is a non-null
transition in which a label from the alphabet of labels
may be produced. At the first state is a self-loop at
which labels may be produced. During a training phase,
statistics for each fenemic phone machine are defined.
That is, for each fenemic phone, a probability for each
transition and a probability for each label being
produced at each non-null transition are computed from
known utterances. Word baseforms are formed by concat-
enating fenemic phone machines.
In systems employing fenemic baseforms and phonetic
0 baseforms, ~he e-d goal is ~o find the most likely word
.

~ Z~6ZZ9
Y0984-021
or words corresponding to a string of labe!.s generated
by an acoustic processor in re;ponse to sr un:~nown
speech input. One approach to achieving this goAl is
discussed in the above-cited application "Apparatus and
Method for Performing Acoustic Matching". Specifically,
the number of words is first reduced from the totAl
number cf words in the vocabulary to a list of likely
candidate words which may be examined in a detailed
match procedure and/or e lan~uage model procedure at
which preferably the most likely word is selected. In
reducing the number of candidate words, the methodology
tsught in the co-pending application involves applying
approximations to the phone machines which result in
faster processing without inordinate computational re-
quirements. In achieving the reduction in words, the
approximate acoustic match performs computations through
a match lattice. The approximate acoustic match has
proved to be effective and efficient in determining
which words should be subjected to detailed matching
and, as appropriate, processing in accordance with ~he
language model.
SUMMARY OF TXE INVENTION
The present invention teaches a different method of re
ducing the number of words that are to be examined in a
detailed ~atch. Th~t is, the present invention relates
to a polling method wherein a table is set up in which
each labsl in the alphabet provides a "vote'~ for each
word in the vocabulary. The vote reflects the likelihood
that given word has produced a given label. The votes

~Z~6Z~9
Y0984-021
-6-
are computed from the label output probability and
transition probability statistics derived during a
i training session.
In accordance with one embodiment of the invention, when
a string of labels is generated by an acoustic
processor, a subject word is selected. From the vote
table, each label in the string is recognized and the
) vote of each label corresponding to the subject word is
; determined. All votes of the labels for the subject word
are accumulated and combined to provide a likelihood
score. Repeating the process for each word in the vo-
cabulary results in likelihood scores for each word. A
i list of likely candidate words may be derived from the
likelihood scores.
In a second embodiment, a second table is also formed
which includes a penalty th~t each label has for each
word in the vocabulary. A penalty assigned to a given
) label indicates the likelihood of a word not producing
the given label. In the second embodiment, both label
votes and penalties are considered in determining the
likelihood score for a given word based on a string of
; labels.
i In order to account for length, the likelihood scores
are preferably scaled based on the number of labels
considered in evaluating a likelihood score for a word.
Moreover, when the end time for a word is not defined
along the string of generated labels, the present in-

~Z~62~
Y0984-021
vention provides that likelihood scores be computed at
successive time intervlls so that a s-lbject word may
have a plurality of successive likelihood scores asso-
ciated therewith. The invention further provides that
the best likelihood score for the subject word --when
compared relative to the likelihood scores of preferably
all the other words in the vocabulary-- is assigned to
the subject word.
In accordance with the inven~ion, a method i5 taught for
selecting likely words from a vocabulary of words
wherein each word is represented by a sequence of at
least one probabilistic finite state phone machine and
wherein an acoustic processor generates acoustic labels
in response to a spoken input. The method comprises the
steps of: (a) forming a first table in which each label
in the alphabet provides a vote for each word in the
vocabulary, each label vote for a subject word indicat-
in& the likelihood of the subject word producing the
label providing the vote. Further, the method prefera-
bly comprises the steps of: (b) forming a second table
in which ePch label is assigned a penalty for each word
in the vocabulary, the penalty assigned to a given label
for a given word being indicative of the likelihood of
the given label not being produced sccording to the
model for the given word; and (c) for a given string of
labels, determining the likelihood of a particular word
which includes the step of combining the votes of all
labels in the string for the particular word together
with the penalties of all labels not in the string for
the particular word.

iz~z~
Y0984-021
i
.
Still further the method preferably comprises the addi-
tional step of: (d) repeating steps (a), (b), and (c)
for all words as the particular word in order to provide
a likelihood score for each word.
If desired, the methodology discussed above is employed
in con3unction with the approximate acoustic matching
techniques set forth in the co-pending application "Ap-
paratus and Method for Perfo~ming Acoustic Matching" and
discussed hereinbelow.
! ~ . .
The present invention provides a fast, computationally
simple, effective technique for determining which words
in a vocabulary have a relatively high likelihood of
corresponding to a string of acoustic labels generated
by an acoustic processor.
.
.
.

~2~6~
yos~4-021
g_
BRIEF DESCRIPTION OF THE DRAWINGS
FlG.l is a general block diagram of a system environment
in which the present invention may be practiced.
FIG.2 is a block diagram of the system environment of
FIG.l wherein the stack decoder is shown in greater
detail.
FIG.3 i~ an illustration of a detailed match phone ma-
chine which is identified in storage and represented
therein by statistics obtained during a trainin8 ses-
sion.
FIG.4 is an illustration depicting the elements of an
acoustic processor.
FIG.5 is an illustration of a typical human ear indi-
cating where components of an acoustic model are de-
fined.
FIG.6 is a block diagram showing portions of the acous-
tic processor.
FIG.7 is a graph showing sound intensity versus fre-
quency, the ~raph being used in the desig~ of the
acoustic processor.
FIG.8 is a graph showing the relationship between sones
and phons.

J~2~ 2
Y0984-021
-10-
FIG.9 is a flowchart representation showing how sound
~ is characteri2ed according to the acoustic processor of
FIG.4.
FIG.10 is a fl~wchart represen-:ation showing how
thresholds are up-dated in FIG.9.
FIG.ll is a trellis diagram, or latt$ce, of a detailed
match procedure.
b FIG.lZ is a diagram depicting a phone machine used in
performing matching.
, FIG.13 is a time distribution diagram used in a matching
procedure having certain imposed condition~.
FIG.14 (a) through (e) are diagrams which show the
S interrelationship between phones, a label string, and
start and end time~ determined in the matching proce-
dure.
FIG.15 (a) is a diagram showing a particular phone ma-
chine of mlnimum length zero and FIG.15 (b) is a time
0 diagram corresponding thereto.
FIG.16 (a) is a phone machine corresponding to a minimuo
length four and FIG.16 (b) is a time diagram corre-
sponding thereto.

~2g~6Z2~
Y0984-021
FIG.17 is a diagram illustrating a tree strlcture cf
phones which permit processing of multi'le words simul-
taneously.
FIG.18 is a flowchart outlining the steps performed in
training phone machines for performing acoustic match-
ing.
FIG.19 is an illustration showing successive steps of
stack decoding.
FIG.20 is a graph depicting likelihood vectors for re-
spectivQ word paths and a likelihood envelope.
FIG.21 is a flowchart representing steps in a stack de-
coding procedure.
i FIG.22 is a flowchart showing how a word path is extended
with words obtained from acoustic matching.
FIG.23 is an illustration showing a fenemic phone ma-
chine.
FIG.24 is a trellis diagra~ for a sequential plurality
of fenemic phone machines.
.
FIG.25 is a flowchart illustrating for the polling
i methodology of the present invention.
FIG.Z6 is a graph illustrating the count distribution
for labels.

:~Z~6~29
Y0984-021
-12-
FIG.27 is a graph illustrating the number of times ~ach
phone produced each label during training.
FIG.Z8 is an illustrstion showing the sequence of phones
constituting each of two words.
FIG.29 is a graph illustrating the expected number of
count~ for a word for each label.
FIG.30 is a table showiDg the vote of each label for each
word.
FIG.31 is a table showing the penalty of each label for
each word.
FIG.32 is a block diagram illustrating apparatus of the
invention.

~IL2~229
Y0984-021
-13-
DESCRIPTION OF A PREFERRED EMBODIM~NT OF TYE INVENTION
(I) Speech Recognition System ~nvironment
A. General Description
In FIG.l, a general block dia8ram of a speech recogni-
tion system lOOO is illustrated. The system lOOO in-
cludes a stack decoder 1002 to which are connected an
acoustic processor (AP) 1004, an array processor 1006
O used in performing a fast approximate acoustic match,
an array processor 1008 used in performing a detailed
acoustic match, a language model 1010, and a work sta-
tion 1012.
The acoustic processor 1004 is designed to transform a
~5 speech waveform input into a string of labels, or
fenemes, each of which in a general sense identifies à
correspond~ng sound type. In the present system, the
acoustic processor 1004 is based on a unique model of
the human ear. and is described in the above-mentioned
O application entitled "Nonlinear Signal Processing in a
Speech Recognition System".
The labels, or fenemes, from the acoustic processor 1004
enter the stack decoder 1002. In a logical sense, the
stack decoder 1002 may be represented by the elements
!S shown in FIG.2. That is, the stack decoder 1002 includes
a search element 1020 which communicates with the work
station 1012 and which communicates with the acoustic
processor process, the fast match processor process, the

~2~62;~g~
Y0984-021
-14-
detailed match process, an~ the lan~uage model process
through respective intecfaces 1022, 10~4, 1026, and
i 1028.
In operation, fenemes from the acousti-: processor 1004
are directed by the search element 1020 to the fast match
processor 1006. The fast match procedure is described
hereinbelow as well as in the application entitled 'Ap-
I paratus and Method for Performing Acoustic Matching".
Briefly, the object of matching is to determine the most
likely word tor words) for a given string of labels.
The fast match is designed to examine words in a vocab-
ulary of words and to reduce the numbsr of candidate
i words for a given string of incoming labels. The fast
match is based on probabilistic finite state machines,
also referred to herein as Markov models.
Once the fast match reduces the number of candidate
words, the stack dacoder 1002 communicates with the
) language model 1010 which determines the contextual
lik~lihood of esch candidate word in the fast match
candidate list based preferably on existing tri-grams.
Preferably, the detailed match examines those words from
the fast r~s~ch candidate list which have a reasonable
likelihood of being the spoken word based on the lan-
guage model computations. The detailed match is dis-
cussed in the above-menlioned application entitled
"Apparatus and Method for Performing Acoustic Matching".

~2~6Z2~3
Y0984-021 -15-
.
The detailed match is performed by means of Markov model
phone machines such as the machine illustrated in FIG.3.
i After the detailed match, the language model is, pref-
erably, again invoked to determine word likelihood. The
stack decodar 1002 of the present invention --using in-
formation derived from the fast matching, detailed
matching, and applying the language model-- is designed
) to determine the most likely path, or sequence, of words
for a string of generated labels.
Two prior art approaches for finding the most likely
word sequence are Viterbi decoding and single stack de-
coding. Each of these technlques are described in the
j article entitled "A Maximum Likelihood Approach to Con-
tinuous Speech Recognition." Viterbi decoding is de-
scribed in section V and single stack decoding in
section VI of the article.
; In the single stack decoding tech~ique, paths of varying
) length are listed in a single stack according to like-
lihood and decoding is based on the single stack. Single
stack decoding must account for the fact that likelihood
is some~hat dependent on path leng~h and, hence, nor-
malization is generally employed.
i~ The Viterbi technique does not requiring normalization
, and is generally practical for small tasks.
.
As another alternative, decoding may be performed with
a small vocabuIary system by examining each possibla

-` ~ 3L2~62~9
Y0984-021
-16-
combination of words as a possible word sequence and
determining which combination has the highest probabil-
ity of proaacing the generated label string. The compu-
tational requirements for this technique become
impractical for large vocabulary systems.
The stack decoder 1002, in effect, serves to control the
other elements but does not perform many computations.
Hence, the stack decoder 1002 preferably includes a 4341
running under the IB~ VM/370 operating system as de-
scribed in publications such as Virtual Machine/System
Product Introduction Release 3 (19~3~. The array
; processors which perform considerable computation have
been implemented with Floating Point System (FPS)
l90L's, which are commercially available.
A novel technique which includes multiple stacking and
a unique decision strategy for determining the best word
sequence9 or path, has been invented by L.R. Bahl,
F.Jelinek, and R.L. Mercer and is discussed
hereinbelow.
B. The Auditory Model and Implementation Thereof in an
Acoustic Processor of a Speech Recognition System
In FIG.4 a specific embodiment of an acoustic processor
llO0, as described above, is illustrated. An acoustic
wave input (e.g., natural speech) enters an analog-to-
digital converter 1102 which samples at a prescribed
rate. ~A typical sampling rate is one sample every 50
* Registered trade mark
s. -~ ~
,

~2~622~
Y0984-021
-17-
microseconds. To shape the edges of the digital signal,
a time window generator 1104 is provided. The output
of the window 1104 enters a fast Fourier transform (~
element 1106 which provides a frequency spectrum output
for each time window.
The output of the FFT elsment 1106 is then processed to
produca labels Y1 Y2---yf- Four elements -- a feature
) selection element 1108, a cltlster element 1110, a pro-
totype element 1112, and a ~abeller 1114 -- coact to
generate the labels. In generating the labels, proto-
types are defined as points (or vectors) in the space
based on selected features and acoustic inputs, are then
i characterized by the same selected features to provide
corresponding points (or vectors), in space that can be
compared to the prototypes.
Specifically, in defining the prototypes, sets of points
are grouped together as respective clusters by cluster
element 1110. Methods for defining clusters have been
based on probability distributions --such as the
Gaussian distribution-- applied to speech. The prototype
of each cluster --relating to the centroid or other
characteristic of the cluster-- is generated by t~e
prototype element 1112. The generated prototypes and
acoustic input --both characterized by the same selected
features-- enter the labeller 1114. The labeller 1114
performs a comparing procedurs which results Ln assign-
ing a label to a particular acoustic input.
-

~z~z~
Y0984-021
; -18-
The selection of appropriate features is a key factor
in deriving labels which represent the acoustic tspeech)
wave input. The presently described acoustic processor
includes an improved feature selection element 110~.
In accordance with the present acoustic processor, an
auditory model is derived and applied in an acoustic
processor of a speech recognition system. In explaining
0 the auditory model, reference is made to FIG.5.
'
FIG.5 shows part of the inner human ear. Specifically,
an inner hair cell 1200 is shown with end portions 1202
extending therefrom into a fluid- containing channel
1204. Upstream from inner hair cells are outer hair
cells 1206 also shown with end portions 1208 extending
into the channel 1204. Associated with the inner hair
cell 1200 and outer hair cells 1206 are nerves which
GOnVsy informstion to the brain. Specifically, neurons
undergo electrochemical changes which result in elec-
trical impulses being conveyed along a nerve to the
brain for processing. Effectuation of the
electrochemical changes, is stimula~ed by the mechanical
motion of the basilar membrane 1210.
It has been recognized, in prior teachings, that th~
i basilar membrane 1210 serves as a frequency analyzer for
acoustic waveform inputs and that portions along the
basilar membrane 1210 respand to respective critical
frequency bands. That different portions of the basilar
membrane 1210 respond to corresponding frequency bands
) has an impact on the loudness perceived for~an acoustic
waveform input. That is, the loudness of tones is per-

L6~9
Y0984-021
-1~-
ceived to be greater when two tones are in different
critical frequency bands than when two tOnes of similar
i power intensity occupy the same frequenc~ band. It has
been found that there are on the order of twenty-two
critical frequency bands defined by the basilar membrane
1210.
Conforming to the frequency-response of the basilar
membrane 1210, the present acoustic processor 1100 in
its preferred form physically defines the acoustic
waveform input into some or all of the critical fre-
quency bands and then examines the signal component for
each defined critical~ frequency band separately. This
function is achieved by appropriately filtering the
signal from the FFT element 1106 (see FIG.5) to provide
a separate signal in the feature selection element 1108
for each examined critical frequency band.
The separate inputs, it is noted, have also been blocked
L into time frames (of preferably 25.6 msec) by the time
window generator 1104. Hence, the feature selection
element 1108 preferably includes twenty-two signals --
each of which represents sound intensity in a given
frequency band for one frame in time after another.
The filtering is preferably performed by a conventional
critical band filter 1300 of FIG.6. The separate
signals are then processed by an equal loudness con-
verter 130Z which accounts for perceived loudness vari-
ations as a function of frequency. In this regard, it
is noted that a first tone at a given dB level at one

lZ~62~
Y0984-021
-20-
frequency mcy diffsr in perceived loudness from a second
tone ~t the same given dB level at a second frequency.
The c_nverter 1302 can be based on empirical data, con-
verting the signals in the various frequency bands so
that each is measured by a similar loudness scale. For
example, the converter 1302 preferably map from acoustic
power to equal loudness based on studies of Fletcher and
O Munson in 1933, subject to certain modifications. The
modified results of these studies are depicted in FIG.S.
In accordance with FIG.5, a.lKHz tone at 40dB is compa-
rable in loudness level to a lOOHz tone at 60dB as shown
by the X in the figure.
~5 The converter 1302 adjusts loudness preferably in ac-
; cordance with the contours of FIG.5 to effect equal
loudness regardless of frequency.
In addition to dependence on frequency, power changes
and loudness changes do not correspond as one looks at
0 a single frequency in FIG.5. That is, variations in the
sound intensity, or amplitude, are not at all points
reflected by similar changes in perceived loudness. For
example, at 100 Hz, the perceived change in loudness of
a lOdB change at about llOdB is much larger than the
perceived change in loudness of a lOdB change at 20d~.
This difference is addressed by A loudness ~caling ele-
ment 1304 which compresses loudness in a predefined
fashion. Preferably, tha loudness scaling element co~-
presses power P by a cube-root factor to pl/3 by re
0 placing loudness amplitude measure in phons by sones.
;

:3L2gL62;~9
Y0984-021
-21-
FIG.7 illustrates a known representation of phons versus
sones determined empirically. By employir.g sones, the
present model remains substantially accurats at large
speech signal amplitudes. One sone, it should be r~-
cognized, has been defined as the loudness of a lKHz tons
at 40d8.
Referring again to FIG.6, a novel time varying response
element 1306 is shown which acts on the equal loudness,
loudness scaled signals assQcia~ed with each critical
frequency band. Specifically, for each frequency band
examined, a neural firing rate f is determined at each
time frame. The firing rate f ls defined in accordancs
with the present processor 8S
f = (So + DL)n (1)
where n is an amount of neurotransmitter; So is a spon-
taneous firing constant which relates to neural firings
independent of acoustic waveform input; L is a measure-
msnt of loudness; and D is a displacement constant.
(So)n corrssponds to the spontaneous neural firing rate
which occurs whether or not there is an acoustic wave
input and DLn corresponds to the firing rate due to the
acoustic wave input.
~ .
i Significantly, the value of n is characterized by the
present acoustic processor as changing over time ac-
cording to the reLationship:
~ ' .
.,,

~24L~2~9
Y0984-021
-22-
dn/dt = Ao - ~So + Sh + DL)n (2)
where Ao is a repîenishment constant and Sh is a spon-
taneous neurotransmitter decay constant. The novel re-
lationship set forth in equation (2) takes into account
that neurotransmitter is being produced at a certain
rate (Ao) and is lost (a) th,rough decay (Sh x n), (b)
through spontaneous flring (So x n), and (c) through
neural firing due to acoustic wavs input (DL x n) The
presumed locations of these modelled phenomena are il-
lustrated in FIG.5.
Equation (2) also reflects the fact that the present
acoustic processor is non-linear in that the next amount
of neurotransmitter and the next firing rate are de-
pendent multiplicatively on the current conditions of
at least the neurotransmitter amount. That is, the
amount of neurotransmitter at time (t+~t) is equal to
the amount of neurotransmitter at time t plus dn/dt~t,
or:
n(t~t)-n(t)~dn/dt ~ (3)
Equations (l), (2), and (3) describe a time varying
signal analyzer which, it is suggested, addresses tha

~Z~6Z29
Y0984-021
-23-
fact that the auditory system appears to b~ adaptive
over time, causing signals on the auditory nerve to be
non-linearly related to acoustic wave input. In this
regard, the present acoustic processor provides the
first model which embodies non-linear signal processing
in a speech recognition system, so as to better conform
to apparent time variations in ths nervous system.
In order to reduce the number of unknowns in equations
(1) and (2), the present acoustic processor uses the
following equation (4) which applies to fixed loudness
L:
So + Sh f DL = l/T (4)
T is a measure of the time it takes for an auditory re-
sponse to drop to 37% of its maximum after an audio wave
input is generated. T, it is noted, is a function of
loudness and is, according to the present acoustic
processor, derived from existing graphs which display
0 the decay of the responss for various loudness levels.
Th~.t is, when a tone of fixed loudness is generated, it
generates a response at a first high level after which
the response decays toward a steady condition level with
a time constant T. With no acoustic wave input, T=To
which is on the order of S0 msec. For a loudness of
L , T=T which is on the order of 30 msec. By set-
max max

~2~6i2~
Y0984-021
-24-
ting Ao = 1, l/(So ~ Sh) is determined to be 5 csec, when
L = 0. When L is LmaX and LmaX = 20 sones, equation (5)
results:
So + Sn ~ D(20) = 1/30 (5)
With the above data and equations, So and Sh are defined
by equations (6) and (7) as:
.
So = DLmaX/(R + (DLmaXToR) - 1) (6)
Sh = l/To - So (7)
where
fsteady statel L= L max
= (8)
steady statel
f t ady stat I represents the firing rate at a given
loudness when dn/dt is zero.
R, it is noted, is the only variable left in the acoustic
processor. Hence, to alter the performance of the
processor, only R is changed. R, that is, is a single
parameter which may be adjusted to alter performance
which, normally, means minimizing steady state effects

~Z~62~
Y0984-021
relative to transient effects. It is desired to mini~
mize steady state effects because inconsistent output
patterns for similar speech inputs generally result from
differences in frequency response, speaker differences,
background noise, and distortion whicb affect the steady
state portions of the speech signal but not the tran-
sient portions. The value of R is preferably set by
Q optimizing the error rate of the complete speech recog-
nitior. system. A suitable value found in this way is R
= 1.5. Values of So and Sh are then 0.0888 and 0.11111
respectively, with D being derived as 0.00666.
Referring to FIG.9, a flowchart of the present acoustic
processor is depicted. Digitized speech in a 25.6 msec
time frame, sampled at preferably 20KHz passes through
a Hanning Window 1320 the output from which is subject
to a Fourier Transform 1322, ta~en at preferably 10 msec
intervals. The transform output is filtered by element
0 1324 to provide a power density output for each of at
least one frequency band -- preferably all the critical
frequency bands or at least twenty thereof. The power
density is then transformed from log magnitude 1326 to
loudness level. This is readily performed according to
the modified graph of FIG.7. The process outlined here-
after which includes threshold up-dating of step 1330
is depicted in FIG.lO.
In FIG.10, a threshold-of-feeling Tf and a threshold-
of-hearing Th are initially defined (at step 1340~ for
O each filtered frequency band m to be 120dB and OdB re-

62~
; Y09~4-021
-26-
;
specti~ely. Thereafter, a speech counter, total frames
register, and a histogram register are reset at step
1342.
Each histogram includes bins, each of which indicates
the number of samples or counts during which power or
some similar measure -- in a given frequency band -- is
in a respective range. A histogram in the present in-
0 stance preferably represents -- for each given frequency
band -- the number of centiseconds during which loudness
is in each of a plurality of loudness ranges. For ex-
ample, in the third frequency band, there may be twenty
centiseconds betw~en lOdB and 20dB in power. Similarly,
in the twentieth frequency band, there may be one hun-
dred fifty out of a total of one thousand centiseconds
between 50dB and 60dB. From the total number of samples
(or centisecondsj and the counts contained in the bins,
percentiles are derived.
~0 A frame from the filter output of a respective frequency
band is examined at step 1344 and bins in the appropriate
histograms -- one per filter -- are incremented at step
1346. The total number of bins in which the amplitude
exceeds 55dB are s D ed for each filter (i.e. frequency
;5 band) at step 1348 and the number of filters indicating
the presence of speech is determined. If there is not
a minimum of filters (e.g. six of twenty) to suggest
speech, the next frame is examined at step 1344. If
there are enough filters to indicate speech at step
1350, a speéch counter is incremented at step 1352. The
speech counter is incremented at step 1352 until 10

z9
Y0984-021
seconds of speech have occurred at step 1354 whereupon
new values for~Tf and Th are defined for each filter at
; step 1356.
The ne~ Tf and Th values are determined for a given
filter as follows. For Tf, the dB value of the bin
holding the 35th sample from the top of 1000 bins (i.e.
the 96.5th percentile of speech) is defined as BINH.
;~ Tf is then set as: Tf= BINH+ 40dB. For Th, the dB value
of the bin holding the (.Ol).(TOTAL BINS - SPEECH COUNT)
th value from the lowest bin is defined as BINL. That
is, BINL is the bin in the histogram which is 1% of the
number of samples in the histogram excluding the number
i of samples classified as speech. Th is then defined as:
h INL 30dB.
Returning to FIG.9, the sound amplitudes are converted
to sones and scaled based on the updated thresholds
(steps 1330 and 1332) as described hereinbefore. An
1 alternative method of deriving sones and scaling is by
taking the filter amplitudes "a" (after the bins have
been incremented) and converting to dB according to the
expresaion:
a = 20 loglO(a)-10 (9)
. .
i~ Each filter amplitude is then scaled to a range between
; O and 120 to provide equal loudness according to the
expression:
~ ql=120(adB-Th)/(Tf-Th) (10)

lZ~G229
Y0984-021
-Z8-
a ql is then preferably converted from a loudness level
(phons) to an approximation of loudness in sones (with
S a lKHz signal at 40d~ mapping to 1) by the expression:
LdB=(aeql~30)/4 (11)
Loudness in sones is then approximated as:
Ls(appr)=lo(LdB)/2o (12)
The loudness in sones Ls is then provided as input to
0 the equations (1) and (2) at step 1334 to determine the
output firing rate f for sach frequency band ~step
1335). With twenty-two frequency bands, a twenty-two
dimension vector characterizes the acoustic wave inputs
over successive time frames. Generally, however, twenty
frequency bands are examined by employing a conventional
mel-scaled filtsr bank.
;
Prior to processing the next time frame (step 1336), the
; next state of n is determined in accordance with
~ equation (3) in step 1337.
0 The acoustic processor hereinbefore described is subject
to improvement in applications where the firin8 rate f
and neurotransmitter amount n have large DC pedestals.
That is, where the dynamic range of the terms of the f
and n equations is important, the following equations
S are derived to reduce the pedestal height.

lZ4L6;~29
Y0984-021
-29-
In the steady state, and in the absence of an acoustic
wave input signal ~L = 0), equation (2) can be solved
i for a steady-state internal state n':
; n'= A/(So + Sh) (13)
The internal state of the neurotransmitter amount n(t)
can be represented as a steady state portion and a
varying portion:
;
n(t) = n'+ n"(t) (14)
Combining equations (1) and (14), the following ex-
pression for the firing rate results:
f(t) = (So + D x L) (n'~ n"(t)) (15)
The term So x n' is a constant, while all other terms
i include either the varying part of n or the input signal
represented by (D x L). Future processing will involve
only the squared differences between output vectors, so
that processing will involve only the squared differ-
ences between output vectors, so that constant terms may
I be disregarded. Including equation ~13) for n', we get
,
f"(t) = (So ~ D x L) x((n"(t)+DxLxA)/(So+Sh)) (16)
.
Considering equation (3), the next state becomes:

~ 6ZZ~
Y0984-021
-30-
n(t + ~t) = n'(t i ~t) + n"(t + ~t) (17)
=n"(t)~A-(So + Sh + D x L) x (n'+ n"(t)) (1,)
= n"(t) - (Sh x n"(t) - (So I Ao x LA) n"(t)
-(Ao x LAx D)/(So + Sh) + Ao - ((So x Ao)+(Sh x
Ao))/(So+Sh)
(19)
This equation (19) may be rewritten, ignoring all con-
stant terms, as:
n"(t + ~t) = n"(t)(l - So ~t) - f"(t) (20)
Equations (15) and (20) now constitute the output
equations and state-update equations applied to each
filter during each 10 millisecond time frame. The re-
sult of applying these equations is a 20 element vector
each 10 milliseconds, each element of the vector corre-
sponding to a firing rate for a respective frequency
band in the mel-scaled filter bank.
With respect to the embodi~ent set forth immediately
hereinabove, the flowchart of FIG.9 applies except that
the equatlons for f, dn/dt, and n(t+l) are replaced by
equations (ll) and (16) ~hich define special case ex-
pressions for firin~ rate f and next state n (t~t) re-
spectively.
It is to be noted that the values attributed to the terms
n the various equations (namely to= 5 csec, tL
m~x

"`` 12~62~g
Y0984-021
-31-
3csec, Ao = 1, R = 1.5, and LmaX= 20) may be set other-
wise and the terms Sol Sh, and D may differ from the
preferable derived values of 0.0888, 0.11111, and
0.00666, respectively, as other terms are set differ-
ently.
The present acoustic model has been practiced using the
PL/I programming language with Floating Point Systems
0 FPS l90L hardware, however may be practiced by various
other software or hardware approaches.
C. Detailed Match
In FIG.3, a sample detailed match phone machine 2000 is
depicted. Each detailed match phone machina is a prob-
abilistic finite-state machine characterized by (a) a
plurality of states Si, (b) a plurality of transitions
tr(Sj¦Si), some of the transitions extending between
different states and some extending from a state back
~ to itself, each transition havin& associated therewith
0 a corresponding probability, and (c) for each label that~; can be generated at a particular transition, a corre-
sponding actual label probability.
In FIG.3, seven states Sl through S7 are provided and
thirteen transitions trl through trl3 are provided in
the detailed match phone machine 2000. A review of FIG.3
; shows that phone machine 2000 has ~hree transitions with
dashed line paths, namely transitions trll, trl2, and
trl3. ~t each of these three transitions, the phone caQ
change from one state to another without producing a

2~3
Y0984-0~1
-32-
label and such a transition is, accordingly, referred
to as a null transition. Along transitions trl through
trlO labels can be produced. Specifically, along each
transition trl through trlO, one or more labels may have
a distinct probability of being generated thereat.
Preferably, ~or each transition there is a probability
associated with each labal that can be generated in the
O system. That is, if there are two hundred labels that
can be selectively generated by the acoustic channel,
each transition (that is no~ a null) has two hundred
"actual label probabilities" associated therewith --each
of which corresponds to the probability that a corre-
sponding label is generated by the phone at the partic-
ular transition. The actual labsl probabilities for
transition trl are rspressntsd by the symbol p followed
by the bracksted column of numsrals 1 through 200, each
~ numeral representing a given label. For label 1, there
O is a probability p[l] t ths dstail~d phone machins 2000
gensrates the label 1 at ~ransition trl. Ths various
actual label probabilities are stored with relation to
ths label and a corrssponding transition.
Whsn a string of labsls YlY2Y3--- is prsssnted to a ds-
tailsd match phone machine 2000 correspondi~g to a given
phons, a match procedurs is performed. The procedure
associated with the detailed match phone machine is ex-
plained with reference to FIG.ll.
FIG.ll is trsllis diagram of the phone machine of FIG.3.
O As in the phone machine, the trellis diagram shows a null
transition from state Sl to state S7 and tr~nsitions

`~ ~z~z~
Y0984-021
-33-
from state Sl to state S2 and from state Sl to state S4.
The transitions between other states are also illus-
S trated. The trellis diagram also shows time measured in
the horizontal direction. Start-time probabilities qO
and ql represent the probabilities that a phone has a
start time at time t=*o or t=tl, respectively, for the
phone. At each start time to and tl, the various tran-
.0 sitions are shown. It should be noted, in this regard,
that the interval between successive start tand end)
times is preferably equal in length to the time interval
of a label.
In employing the detailed match phone machine 2000 to
determine how closely a given phone matches the labels
of an incoming string, an end-time distribution for the
phone is sougbt and used in determining a match value
for the phone. The notion of relying on the end-time
distribution is common to all embodiments of phone ma-
O chines discussed herein relative to a matching proce-
~; dure. In generating the end-time distribution to
..,; ~.
` perform a detailed match, the detailed match phone ma-
chine 2000 involves computations which are exact and
complicated.
Looking at the trellis diagram of FIG.ll, we first con-
` sider the computations required to have both a start
.
time and end time at time t=to. For this to be the case
according to the example phone machine structure set
forth in FIG.3, the following probability applies:
O Pr(S7,t=tO) ~ qOxT(1'7) ~ Pr(Sz,t=tO)xT(2~7) +
Pr(s3~t-to)xT(3~7) (21)

~2~6Z29
Y0984-021
-34-
.
'
where Pr represents "probability of" and T represents
the transition probability between the two parenthet-
ically identified states. The above equation indicates
the respective probabilities for the three conditions
unter which the end time can occur at time t=to. More-
over, it is observed that the end time at t=to is limited
in the current example to occurrence at state S7.
Looking next at the end time t=tl, it is noted that a
calculation relating to ever.y state other than state Sl
must be made. The state Sl starts at the end time of the
previous phone. For pUrpOSQS of explanation, only the
calculations pertaining to state S4 are set forth.
,
For state S4, the calculation is :
Pr(S4,t=tl)= Pr(Sl,t=tO~xT(1~4)xPr(yll1~4) +
PrtS4,t=t0)xT(4~4)xPrtyl¦4~~4) (22)
In words, the equation (22) set forth immediately above
indicates that the probability of the phone machine be-
i~g in state S4 at time t=tl is dependent on the sum ofthe following two terms (a) the probability of being at
state Sl at time t-to multiplied by the probability (T)
of the transition from state Sl to state S4 multiplied
further by the probability (Pr) of a given label YL being
generated given a transition from state Sl to state S4
ant (b) the probability of being at state S4 at time t=*o
multiplied by the probability of the transition from
state S4 to itself and further multiplied by the proba-
bility of generating the given label Yl during and given
the transition from state S4 to itself.
;

" ~2~2~g
Y0984-021
-35-
Similarly, calculations pertaining to the other states
(excluding state Sl) are also performed to generate
corresponding probabilities that the phone is at a par-
ticular state at time t=tl. Generally, in determining
the probability of bein~ at a subiect state at a given
time, the detailed match ta) recognizes each previous
state that has a transition which leads to the subject
0 state and the respective probability of each such pre-
vious state; (b) recognizes" for each such previous
stata, a value representing the probability of the label
that must be generated at the transition between each
such previous state and the current state in order to
conform to the label string; and tc) combines the prob-
ability of each previous state and the respective value
representing the label probability to provide a subject
state probability over a corresponding transition. The
overall probability of being at the subject state is
0 determined from the subject state probabilities over all
transitions leading thereto. The calculation for state
S7, it is noted, includes terms relating to the three
null transitions which permit the phone to start and end
at time t=tl with the phone ending in state S7.
As with the probability determlnations relative to times
` t=to and t=tl, probability determinations for a series
of other end times are preferably generated to form an
end-time distribution. The value of the end-time dis-
tribution for a given phone provides an Lndication of
how well th~ given phone matches the incoming labels.
.

6229
Y0984-0~1
In determining how well a word matches a string of in-
coming labe?s, the phones which represent the word are
S processed in sequence. Each phone generates an end-tima
distribution of probability values. A match value for
the phone ~s obtained by summing up the end-time proba-
bilities and then taking the logarithm of that sum. A
start-time distribution for the next phone is derived
~0 by normslizing the end-time distribution by, for exam-
ple, scaling each value thereo by dividing each value
by the sum so that the sum gf scaled values totals one.
It should be realized that there are at least two methods
of determining h, the number of phones ~o be examined
for a given word or word string. In a depth first method,
computation is made along a baseform --computing a run-
ning subtotal with each successive phone. When the sub-
total is found to be below a predefined threshold for a
given phone position therealong, the computation termi-
0 nates. Alternatively, in a breadth first method, a
computation for similar phone positions in each word is
made. The computations following the first phone in each
word, the second phone in each word, and so on are made.
In the breadth first method, the computations along the
S same number of phones for the various words are compared
at the sam~ telative phone positions therealong. In ei-
ther method, the word(s) having the largest sum of match
values is the sought object.
The detailed match has been implemented in APAL (Array
,0 Processor Assembly Language) which is the na~ive Pssem
bler for the Floating Point Systems, Inc. l90L.

' ~2~6Z29
Y0984-021
It should be ;ecogni~ed that the detailed match requires
considerable memory for storing each of the actual labsl
probabilities (i.e., the probability that a given phons
generates a given label y at a given tr~nsition); the
transition probabilities for each phone machine; and the
probabilities of a givan phone being at a given state
at a given time after a defined start time. The above-
0 noted FPS l90L is set up to make the various computations
of end times, match values based on, for example, a sum
--preferably the logarithmic sum of end time probabili-
I tiés; start times based on the previously generated end
j time probabilities; and word match scores based on the
match values for sequential phones in a word. In addi-
tion, the detailed match preferably accounts for "tail
probabilitie~" in the matching procedure. A tail proba-
bility measures the likelihood of successive labels
without regard to words. In a simple embodiment, a given
!0 tail probability corresponds to the likelihood of a la-
; bel following another label. This likelihood is readily
`~ determined from strings of labels generated by, for ex-
' ample, some sample speech.
i
Hence, the detailed match provides sufficient s~orage
'5 to contain baseforms, statistics for the Markov models J
and tail probabilities. For a 5000 word vocabulary where
each word comprises approximately ten phones, the
baseforms have a memory requirement of 5000 x 10. Where
! there are 70 distinct phones (with a Markov model for
each phone) and 200 distinct labels and ten transition~
at whlch any label has a probability of bei~g produced,
;

~622g
Yos~4-o2l
-38-
the statistics would require 70 X 10 X 200 locations.
However, it is preferred that the phone machines are
divided into three portions --a start portion, a middle
portion, and an end portion-- with statistics coxre-
sponding thereto. (The three self-loops ~re preferably
included in successive portions.) Accordingly, the
storaga requirements are reduced to 70 x 3 x 200. With
Lo regard to the tail probabilities, 200 x 200 storage lo-
cations are needed. In this arrangement, 50K integer ant
82K floating point storage performs satisfactorily.
It should be noted that the detailed match may be im-
I plemented by using fenemic, rather than phonetic,
phones. Appendix 1 provides a program listing that cor-
responds to the main computational kernel of a fenemic
detailed match. The routine in Appendix 1 extends a
lattice --which corresponds to a fenemic baseform of a
current word-- forward in time by a single time step.
The subroutin~ EXTLOOP is the main loop. Therebefore,
; the pipeline is started up and partial computations
needed for the m~in loop are performed. After the main
loop, partial computations remaining in the computa-
tional pipeline are emptied.
D. Basic Fast Match
Because the detailed match is computationally expensive,
a basic fast match and an alternative fast match which
reduces the computation requirements with some moderate
sacrifice in accuracy is provided. The fast match is

12~6Z~
Y0984-021
-39-
preferably used in conjunction with the the dQtailed
match, the fast match listing likely c~ndidate wor~s
from the vocabulary, and the detailed match being per-
formed on, at most, the candidate words on the list.
A fast approximate acoustic matching technique is the
subject of the co-pending patent application entitled
"Apparatus and Method of Performing Acoustic Matching".
In the fast approximate acoustic match, preferzbly each
phone machine is simplified by replacing the actual la-
bel probability for each label at all transitions in a
given phone machine with a specific replacement value.
The specific replacement value is preferably selected
so that the match value for a given phone when the re-
placement values sre used is an overestimation of the
match value achieved by the detailed match when the re-
placement values do not replace the actual label proba-
bilities. One way of assuring this condition is by
selecting each replacement value so that no probability
corresponding to a given label in a given phone machine
i . . ~,
is greater than the replacement value thereof. By sub-
stituting the actual label probabilities in a phone ms-
chine with corresponding replacement values, the number
S of required computations in determining a match score
for a word is reduced greatly. Moreover, since the re-
placement value is preferably an overeStimation, the
resul~ing match score is not less than would have pre-
~i viously been determined without the replace~ent.
O In a specific embodiment of performing an acoustic match
in a linguistic decoder with Markov models, each phone

2~
Y0984-021
-40-
machine therein is characterized --by training-- to have
(a~ a plurality of states and transition paths betw~en
states, (b) transitions tr(SjlSi) having probabilities
T(i~j) each of which represents the probability of a
transition to a state Sj given a current state Si where
Si and Sj may be the same state or different states, and
(c) actual label probabilities wherein each actual label
0 probability p~yk¦i-~;) indicates the probability that a
label Yk is produced by a g~ven phone machine at a given
transition from one state to a subsequent state where k
is a label identifying notation; each phone machine in-
cluding (a) means for assigning to each Yk in said each
~5 phone machine a sin~le specific value pl(yk) and (b)
means for repl~cing each actual output probability
P(ykli';) at each transition in a given phone machine
; by the single specific value P~(yk) assigned to the
corresponding Yk. Preferably, the replacement value is
!0 at least as great as thé maximum actual label probabil-
ity for the corresponding Yk label at any transition in
a particular phone machine. The fast match embodiments
are employed to define a list of on the order of ten to
one hundred candidate words selected as the most likely
!S words in the vocabulary to correspond to the incoming
labels. The c~ndidate words are preferably sub~ected to
the language model and to the detailed match. By paring
the number of words considered by the detailed match to
~ ~ on the order of 1% of the words in the vocabulary, the
computational cost is greatly reduced while accuracy is
maintained.

ZZ~
Y0984-021
-41-
The basic fast match simplifies the detailed match by
replacing with a single value the actual label proba-
bilities for a given label at all transitions at which
the given label may be generated in a given phone ma-
chine. That is, regardless of the transition in 8 given
phone machine whereat a label has a probability of oc-
currLng, the probability is replaced by a single spe-
~0 cific value. The value i5 preferably an overestimate,
being at least as great as the largest probability of
; the label occurring at any transition in the given phon~
machine.
~y setting the label probability replacement value as
.5 the maximum of the actual label probabilities for the
given labsl in the given phone machine, it is assured
that the match value generated with the basic fast match
is at least as high as the match value that would result
from employing the detailed match. In this way, the
~O basic fast match typically overestimates the match value
of each phone so that more words are generally selected
as candidate words. Words considered candidates accord-
ing to the detailed match also pass muster in accordance
with ~he basic fast match.
~5 Referring to FIG.12, a phone machine 3000 for the basic
fast match is illustrated. Labels (also referred to as
~ symbols and fenemes) enter the basic fast match phone
! machina 3000 together with a start-time distribution.
The start-time distribution and the label string input
is like that entering the detailed match phone machine
described hereinabove. It should be realized that the

22g
Y0984-021
-42-
start time may, on occasion, not be a distribution-over
a plurality of times but may, instead, represent a pre-
S cise time --for example following an interval of si-
lence-- at which the phone begins. When speech is
continuous, however, the end-time distribution is us2d
to define the start-time distributiQn (as is discussed
in greater detail hereinbelow). The phone machinQ 400
0 generates an end-time distribution and a match value for
the particular phone from the generated end-time dis-
; tribution. The match score ~or a word is defined as the
sum of match values for component phones --at least the
first h phones in the word.
Referring now to FIG.13, a diagram of a basic fast match
computation is illustrated. The basic fast match com-
putation is only concerned with the start-time distrib-
ution9 the namber --or length of labels-- produced by
the phone, and tha replacement values p'y associatad
0 with each label Yk. By substituting all actual label
probabilities for a given label in a given phone machine
by a corresponding replacement value, the basic fast
match replaces transition probabilities with length
distribution probabilities and obviates the need for
including actual label probabilities (which can differ
for each transition in a given phone machine) and prob-
abilities of bein~ at a given state at a given time.
;, ~
In this regard, the length distributions are determined
from the detailed match model. Specifically, for each
~0 length in the length distribution, the procedure pref-
erably examines each state individually and determines

~2~6~29
Y0984-021
-43-
for each state the various transition paths by which the
currently examined state can occur (a) given a partic-
ular label length and (b) regardless of the outputs
along the transitions. The probabilities for all tran-
sition paths of the particular length to each subject
~ state ars summed and the sums for all the subject states
; are then added to indicate the probability of a given
0 length in the distribution. The above procedure is re-
peated for each length. In accordance with the preferred
form of the matching proced~re, these computations are
made with reference to a trellis diagram as is known in
the art of Markov modelling. For transition paths which
share branchss along the trellis structure, the compu-
tation for eaoh common branch need be made only once and
is applied to each path that includes the common branch.
In the diagram of FIG.13, two limitations are included
by way of e~ample. First, it is assumed that the length
'0 of labels produced by the phone can be zero, one, two,
or three having respeotive probabilities of lo~ 11, 12,
and 13. The start time is also limited, permitting only
four start times having respective probabilities of qO,
ql, q2, and q3. With these limitations, the following
'5 equations define the end-time distribution of a subject
phone as:
~o = qOlo
~1 = qllo ~ qollPl
' ~2=q21o+ qlllP2 +qol2PlP2
~3=q310+ q211P3+ qll2P2P3 +qO13PlP2P3
11p~+ q212p3p4~ql 13P2P3 P4
~S q3l2p4p;+ q2l3p3p4p5

zz~
Y0984-021
-44-
~6= q~tl3p4p5p6
In examining the equations, it is observed that ~3 in-
cludes a term corresponding to each of four start times.
The first term represents ~he probability that the phone
starts at time t=t3 and produces a length of zero labels
--the phone starting and ending at the same time. The
second term represents the probability that the phone
0 starts at time t=t2, that the length of labels is one,
and that a label 3 is produced by the phone. The third
term represents the probability that the phone starts
at time t=tl, that the length of labels is two ~namely
labels 2 and 3), and that labels 2 and 3 ara produced
S by the phone. Similarly, the fourth term represents the
probability that the phone starts at time t=to; that the
length of labels is three; flnd that the three labels 1,
2, and 3 are produced by the phone.
Comparing the computations required in the basic fast
0 m~tch with those required by the detailed match suggest
the relative simplicity of the former relative to the
latter. In this regflrd, it is noted that the p'y value
remains the same for e~ch appearance in all the
equstions as do the label length probabilities. More-
~S over, with the length and start time limitations, the
computations for the later end times become simpler. For
example, at ~6~ the phone must s~art at time t=t3 and
all three labels 4, 5, and 6 must be produced by the
phone for that end time to apply.

12~2,29
:
; Y0984-021
-45-
In generating a match value for a subject phone, the end
time probabilities along the defined end-time distrib-
~5 ution are summed. If desired, the log of the sum is taken
to provide the expression:
match value = 181ot~0 + ~ + ~6)
As noted previously, a match score for a word is readily
determined by summing the match values for successive
L0 phoné3 in a particular word.
.
In describing the generating of the start time distrib-
ution, reference is made to FIG.14. In FIG.14(a), the
word THEl is repeated, broken down into its component
phones. In FIG.14(b), the string of labels is depicted
over time. In FIG.14(c), a first start-time distribution
is shown. The first start-time distribution has been
derived from the end-time distribution of the most re-
cent previous phone (in the previous word which may in-
clude a "word" of silence). Based on the label inputs
Z0 and the start-time distribution of FIG.14(c), the end-
time distribution for the phone DH, ~DH~ is generated.
The start-time distribution for the next phone, UH, is
determined by recogni~ing the time during which the
previous phone end-time distribution exceeded a thresh-
~5 old (A) in FIG.14(d). (A) is determined individually for
each end-time distribution. Preferably, (A) is a func-
tion of the sum of the end-time distribution values for
a subject phone. The interval between times a and b thus
represents the time during which the start-time dis-
~30 tribution for the phone UH is set. (See FIG.14(e).) The
interval between times c and d in FIG.14(e) corresponds

~2~229
yos~4-021
-46-
to the ~imes between which the end-time distribution for
the phone DH exceeds the threshold (A) and between which
S th~ start-time distribution of the next phone is set.
; The values of the start-time distribution are obtained
by normalizing the end-time distribution by, for exam-
ple, dividing each end-time value by the sum of the
end-time values which exceed the threshold (A).
'
.0 The basic fast match phone machine 3000 has been imple-
mented in a Floating Point Systems Inc. l90L with an
APAL program. Other hardware and software may also be
used to develop a specific form of the matching proce-
dure by following the teaching set forth herein.
E. Alternative Fast Match
The basic fast match employed alone or, preferably, in
conjunction with the detailed match andtor a language
model greatly reduces computation requirements. To fur-
~ ther reduce computational requirements, the present
teachings further simplifies ~he detailed match by de-
fining a uniform label length distribution between two~
lengths --a minimum length Lmi and a maximum length
L . In the basic fast match, the probabilities of a
max
phone generating labels of a given length --namely lo~
11, 12, etc.-- typically have differing values. Accord-
ing to the alternative fast match, the probability for
each length of labels is replaced by a single uniform
value.

~6Z29
Y0984-021
-47-
Preferably, the mi~imum len~th is equal to the smallest
length having ~ nonzero probability in the original
length distrib~tion, although other lengths may be se-
lected if desired. The selection of the maximum length
is more arbitrary than the sel~ction of the minimum
length, but is significant in that the probability of
lengths less than the minimum and grea~er than th~ max-
D imum are set as zero. By defining the length probability
to exist between only the minimum length and the maximum
-length, a uniform pseudo-distribution can be set forth.
In one approach, the uniform probability can be set as
the average probability over the pseudo-distribution.
Alternatively, the uniform probability can be set as the
ma~imum of the length probabilities that are replaced
by the uniform value.
~`
The effect of characterizing all the label length prob-
abilities as equal is readily observed with reference
0 to the equations set forth above for the end-time dis-
tribution in the basic fast match. Specifically, the
length probabilities can be factored out as a constant.
With Lmin being set at zero and all length probabilities
being replaced by a single constant value, the end-time
~5 distribution can be characterized as:
m ~m~l qm ~m-lPm
where "1" is the single uniform replacement value and
- ~ where the value for Pm corresponds preferably to the
~ replacement value for a given label being generated in
the given phons at time m.
,
.
,

`` ~2~622~
Y0984-021
-48-
.
.~ .
; For the above equation for ~m~ the ~atch vA1ue is defined
as :
match value = log10(~O + al + + em) glO
In comparing the basic fast match and the altarnative
fast match, it has been found that the number of required
additions and multiplications are greatly reduced by
employing the alternative fast match phone machines.
L0 With Lmin = ~ it has been found that the basic fast
match requires forty multiplications and twenty addi-
tions in that the length probabilities must be consid-
ered. With the alternative fast match, ~m is determined
; recursively and requires ona multiplication and one ad-
L5 dition for each successive ~ .
To further illustrate how the alternative fast match
simplifies computations, FIG.15 and FIG.16 are provided.
In FIG.15~a), a phone machine embodiment 3100 corre-
sponding to a minimum length Lmin=O is depicted. The
maximum length is assumed to be infinite so that the
length distribution may be characterized as uniform.
In FIG.lS(b), the trellis diagram resulting from the
phona machine 3100 is shown. Assuming that start times~
after qn are outside the start-time distribution, all
determinations of each successive 3m with m<n requira
one addition and one multiplication. For determinations
of end times thereafter, there is only one required
multiplication and no additions.
In FIG.16, Lmin = 4. FIG.16(a) illustrates a specific
embodiment of a phone machine 3200 therefor and

1~6ZZ~
Y0984-021
-49-
FIG.16(b) shows a corresponding trellis diag.am. Be-
cause Lmin~4, the trellis dia8ram of FIG_16(b) has a
zero probability along the paths marked ~, v, w~ and z.
For those end times which extend between ~4 and ~n' it
is noted that four multiplications and one addition is
required. For end times greater than n+4, one multipli-
catlon and no additions are required. This embodiment
O has been implemented in APAL code on a FPS l90L.
In Appendix 2, a program listing corresponding to the
main computational kernel of the fast (approximate)
match is provided. The code corresponds to the case
where Lmin=4.
It should be noted that additional states may be added
to the FIG.15 or FIG.16 embodiments as desired.
F. Matching Based On First J Labels
As a further refinement to the basic fast match and al-
ternative fast match, it is contemplated that only the
~0 first J labels of a string which enters a phone machine
be considered in the match. Assuming that labels are
produced by the acoustic processor of an acoustic chan-
nel at the rate of one per centisecond, a reasonable
~ value for J is one hundred. In other words, labels cor-
responding to on the order of one second of speech will
be provided to determine a match between a phone and the
labels entering the phone machine. ~y limiting the nu~-
ber of labels examined, two advantages are realized.

;22~
YOg84-021
-50-
First, decoding delay is reduced and, second, problems
in comparing the scores of short words with long words
are substantially avoided. The length of J can, of
course, be varied as desired.
.
The effect of limiting the number of labels examined can
be noted with reference to the trellis diagram of
FIG.16(b). Without the present refinement, the fast
O match score is the sum of the probabilities of ~m~S along
the bottom row of the diagram. That is, the probability
of being at state S4 at each time starting at t=to ~for
! Lmin=O) or t=t4 (for Lmin~') is determined as a 9m and
all 9m'5 are then totalled. For Lmin=4, there is no
~t.5 probability of being in state S4 at any time before t4.
With th~ refinement, the summing of ~m'S terMinates at
time J. In FIG.16(b), time J corresponds to time t +2.
, .
Terminating the examination of J labels over J time in-
tervals can result in the following two probability
'O summations in determining a match score. First, as de-
scribed hereinbefore, there is a row calculation along
the bottom row of the trellis diagram but only up to the
time J-l. The probabilities of being in state S4 at each
time up to time J-l are summed to form a row score.
'5 Second, there is a column score which corresponds to tha
1 sum of probabilities that the phone is at each respec-
- ~ tive state SO through S4 at time J. That is, the column
score is:
column score = ~e=O4Pr(Sf,J)

Y0984-021
-51-
.
The match score for a phone is obtained by s = ing the
row score and column score and then taking the logarithm
of that sum. To continue the fast match for the next
phone, the values along the bottom row --preferably in-
cluding time J-- are used to derive the next phone
start-time distribution.
i
After determining a match score for each of b consec-
O utive phones, the total for all phones is, as before
noted, the sum of the match scores for all the phones.
In examining the manner in which the end-time probabll-
~ ities are generated in the basic fast match and alter-
I native fast match embodiments set forth above, it is
~5 noted that the determination of column scores does not
conform readily to the fast match computations. To bet-
ter adapt the refinement of limiting the number of la-
bels examined to the fast match and alternative match,
the present matching technique provides that the column
~0 ; score be replaced by an additional row score. That is,
an additional row score is determined for the phone be-
; ing at state S4 (in FIG.16(b)) between times J and J+R
where K is the maximum number of states in any phone
machine. Hence, if any phone machine has ten states, the
'5 present refinement adds ten end times along the bottom
row of the trellis for each of which a probability is
determined. All the probabilities along the bottom row
up to and including the probability at time J+K are added
~ to produce a match score for the given phone. As before,
consecutive phone match values are summed to provide a
word match score.

~L246229
Y0984-021
-52-
This embodiment has been implemented in APAL code on a
FPS l90L; however as with other portions of the system
may be implemented with other codes on other hardware.
;
G. Phone Tree Structure and Fast Match Embodiments
By employln~ the basic fast match or alternative fast
match --with or without the maximum label limitation--
the computational time required in determining phone
D match values is tremendously reduced. In addition, the
computational savings remain high even when the detailed
match is performed on the words in the fast match derived
list.
.
The phone match values, once determined, are compared
along the branches of a tree structure 4100 as shown in
FIG.17 to determine which paths of phones are most
probable. In FIG.17, the phone match values for DH and
UHl temanating from point 4102 to branch 4104) should
sum to a much higher value for the spoken word "the" than
D ths various sequences of phones branching from the phone
MX. In this regard, it shoult be observed that the phone
match value of the first MX phone is computed only once
and is then used for each baseform extending therefrom.
(See branches 4104 and 4106.) In addition, when the
5~ total score calculated along a first sequence of
branches is found to be much lower than a threshold value
or much lower than the total score for other sequences
of branches, all baseforms extending from the first se-

29
Y0984-021
-53-
.
quence may be simultaneously eliminated as candidate
words. For example, baseforms associated with branches
4108 through 4118 are simultaneously discarded when it
is determined that MX is not a likely path.
With the fast match embodiments and the tree structure,
i an ordered list of candidate words is generated with
great computational savings.
: ` ~
With regard to storaga requirements, it i~ noted that
the tree structure of phones, the statistics for the
phones, and tail probabilities sre to be stored. With
regard to the tree structure, there are 25000 arcs and
four datawords characterizing each arc. The first
dataword represents an index to successor arcs or
phones. The second dataword indicates the number of
successor phones along the branch. The third dataword
indicates a* which node in the tree the arc is located.
And the fourth dataword indicates the current phone.
Hence, for the tree structure, 25000 x 4 storage spaces
are required. In the fast match, there are 100 distinct
phones and 200 distinct f~nemes. In that a feneme has
a sin~le probability of being produced anywhere in a
phone, storage for 100 x 200 statistical probabilities
is required. Finally, for the tail ~robabilities, 200
x 200 storage spaces are requlred. lOOR integer and 60K
floating point storage is sufficient for the fast match.
.
H. Language Model
,

229
Y0984-021
-54-
As noted previously, a language model which stores in-
formation --sueh as tri-grams-- relating to words in
context may be included to enhance the probability of a
correct word selection. Language models have been re-
ported in the literature.
The language model lO10, preferably, has a unique char-
acter. Specifically, a modified tri-gram method is
used. In accordance with this method, a sample text is
examined to determine the likelihood of each ordered
triplet of words, ordered pair of words, or single words
in the vocabulary. A list of the most likely triplets
of words and a list of the most likely pairs of words
are formad. Moreover, the likelihood of a triplet not
being in the triplet list and the likelihood of a pair
not being in the pair list are respectively.
In accordance with the language model, when a subject
word follows two words, a determination is made as to
whether the subject word and the two preceding words are
on the triplet list. If so, the stored probability as-
signed to the ~riplet is indicated. If the subject word
and its two predecessors are not on the triplet list, a
determination is made as to whether the subject word and
its adjacent predecess~ are on the pair list. If so,
the probability of the pair is multiplied by the proba-
bility of a triplet not being on the triplet list, the
product then being assigned to the subject word. If the
subject word and its predecessor(s) are not on the
triplet list or pair list J the probability of the sub-

-
Y0984-021
ject word alone is multiplied by the likelihood of a
triplet not being on the triplet lis~ and by the proba-
bility of a pair not being on the pair list. The product
is then assigned to the subject word.
Referring to FIG.18, a flowchart 5000 illustrating tha
training of phone machines employed in acoustic matching
is shown. At step 5002, a vocabulary of words --typi-
0 cally on the order of 5000 words-- is defined. Each word
is then represented by a sequence of phone machines. The
phone machines have been, by way of example, been shown
as phonetic-type phone machines but may, alternatively,
comprise a sequence of fenemic phones. Representing
~ords by a sequence of phonetic-type phone machines or
by a sequence of fenemic phone machines is discussed
hereinbelow. A phone machine sequence for a word is re-
ferred to as a word baseform.
In step 5006J the word baseforms are arranged in the tree
structure described hereinabove. The statistics for each
phone machine in each word baseform are determined by
trainin~ according to the well-known forward-backward
algorithm set forth in the article "Continuous Speech
Recognition by Statistical Methods" by F. Jelinek.
At step 5C09, values to be substituted for ac~ual pa-
rameter values or statistics used in the detailed match
are determined. For example J the values to b~ substi-
tuted for the actual label output probabilities are de-
termined. In step SO1OJ the determined values replace
? the stored actual probabilities so that the phones in

2~9
Y0984-021
-56-
each wor~ basefo-m include the approximate substitute
value;. All apprcximations relating to the basic fast
; matcl. are performed in step 5010.
A decision is t~en m~lde as to whether the acoustic
matching is to be enhanced (step 5011). If not, the
values determined for the basic approximate match are
set for use and other estimations relating to other ap-
proximations are not set (step 5012). If enhancement is
desired, step 5018 is followed. A uniform string length
distribution is definet (step 5018) and a decision is
made as to whether further enhancement is desirad (step
5020). If not, label output probability values and
; string length probability values are approximated and
sat for use in the acoustic matching. If further en-
hancement is desired, acoustic matching is limited to
the first J labels in the generated string (step 5022).
Whether or not ~ne of the enhanced embodiments is se-
3 lected, the parameter values determined are set in step
5012, whereupon each phone machine in each word baseform
has been trained with the desired approximations that
enable the fast approximate matching.
.
J. Stack Decoder
A preferred stack decoder used in the speech recognition
system of FIG.l has been invented by L. Bahl, F. Jelinek,
and R.L. Mercer of the IBM Speech Recognition Group. The
preferred stack decoder is now described.
;

Z4~2~9~
; Y0984-021
.; .
-S7-
~ !
In FIG.l9 and FIG.20 a plurality of successive labels
YlY2--- are shown generated at successive "label inter-
i vals", or "label positions".
Also shown in FIG.20 are a pluraiity of some generated
word paths, namely path A, path B, and path C. In the
context of FIG.l9, path A could correspond to the entry
"to be or", path B to the entry "~wo b", and path C to
! the entry "too". For a subject word path, there is a
lab~l ~or equivalently a la~el interval) at which the
j subject word path has the highest probability of having
ended --such label being refQrred to as a "boundary la-
bsl".
For a word path W representing a sequencQ of words, a
most likely end time --represented in the label string
as a "boundary label" betweQn two words-- can be found
by known methods such as that describQd in an article
entitled "Faster Acoustic Match Computation" (by L.R.
i Bahl, F. Jelinek, and R.L. ~ercer) in the IBM_Technical
Disclosure Bulletin volume 23j number 4, September 1980.
Briefly, the article dlscusses methodology for address-
ing two similar concerns: (a) how much of a label string
Y is accounted for by 6 word (or word sequence) and (b)
) at which lab~l interval does a partial sentence --cor-
responding to a part of the label string-- end.
!
For any given word path, th~re is a "likelihood value"
associated with each label or label interval, including
tke first label of the label string through to the
) boundary label. Taken together, all of the likelihood

~2~62~g
Y0984-021
-58-
values for a given word path represent a "likelihood
vector" for the given word path. Accordingly, for each
word path there is a correspon~ing likelihood vector.
Likelihood values Lt are illustrated in FIG.20.
A "likelihood envelope" ~t at a label interval t for a
collection of word paths Wl, W2,...,Ws is defined math-
ematically as:
O At=max ( Lt (W ), - - -, Lt (W ) )
That is, for each label interval, the likelihood envel-
ope includes the highest likelihood value associated
with any word path in the collection. A likelihood en-
velope 1040 is illustrated in FIG.20.
S A word path is considered "complete" if it corresponds
to a complete sentence. A complete path is preferably
identified by a speaker entering an input, e.g. pressing
a button, when he reaches the end of a sentence. The
entered input is synchronized with a label interval to
0 mark a sentence end. A complete word path cannot be ex-
tended by appending any words thereto. A "partial" word
path corresponds to an incomplete sentence snd can be
extended.
Partial paths are classified as "live" or "dead". A word
S path is "dead" if it has already been extended and "live"
if it has not. With this classification, a path which
has already been extended to form one or more longer

~Z~2~
Y0984-021
_59 _
extended word paths is not reconsidered for ax~er.sion
at a subsequent time.
S Each word path is also characterizable as "good" or
"bad" relative to the likelihood envelope. The word pa~h
is good if, at the label corresponding to the boundary
label therecf, the word path has a likelihood value
which is within ~ of the maximum likelihood envelope.
Otherwise the word path is marked as "bad". Preferably,
but not necessarily, ~ is a fixed value by which each
value of the maximum likelihood envelope is reduced to
serve as a good/bad threshold level.
For each label interval there is a stack element. Each
live word path is assigned to the stack element corre-
sponding to ths label interval that corresponds to the
boundary label of such a live path. A stack element may
have æero, one, or more word path entries --the entries
being listad in order of likelihood valuq.
The steps performed by the stack decoder 1002 of FIG.l
are now discussed.
Forming the likelihood envelope and determining which
word paths ase "good" are interrelated as suggested by
~ the sample flowch~rt of FIG.21.
i In thq flowchart of FIG.21, a null path is first entered
into the first stack(0) in step 5050. A stack(complete)
element is provided which contains complete paths, if
any, which have been previously deter~ined (step 5052).
.

29-
Y0984^021
-60-
Each complete path in the stack(complete) element has a
likelihood vector associated therewith. The likelihood
i vector of the complete path having the highest likeli-
hood at the boundary label thereof initially defines the
maximum likelihood envelope. If there is no complete
path in the stask(complete) element, the maximum li~e-
lihood envelope is initialized as -~ at each label in-
terval. Moreover, if complete paths ars not specified,
the maximum likelihood envelope may be initialized at
-~. Inltializing the envelo~e is depicted by steps 5054
and 5056.
After the maximum likelihood envelope is initialized,
it is reduced by a predefined amount ~ to form a ~-good
region above the reduced likelihoods and a ~-bad region
below the reduced likelihoods. The value of ~ controls
the breadth of the search. The larger ~ is, the larger
the number of word paths that are considered for possi-
ble extension. When log10 is used for deeermining Lt, a
value of 2.0 for ~ provides satisfactory results. The
value of ~ is preferably, but not necessarily, uniform
along the length of label intervals.
If a word path hss a likelihood at the boundary label
thereof which is in the ~-good region, the word path is
marked "good". Otherwise, the word path is marked "bad".
:`
; As shown in FIG.21, a loop for up-dating the likelihood
envelope and for marking word paths as "good" (for pos-
sible extension) or "bad" starts with the finding of the
longest unmarked word path (step 5058). If more than one

:~2~29
Y0984-021
-61-
unmarked wort path is in the stack corresponding to the
longest word path length, the word path having tne
highest likelihood at the boundary label thereof is se-
lected. If a word path is found, it is marked as "good"
if the llkelihood at the boundary label thereof lies
within the ~-good region or "bad" otherwise (step 5060).
i If the word path is marked "bad", anGther unmarked live
0 path is found and marked (step 5062). If the word path
is marked "good", the likelihood envelope is up-dated
to include the likelihood values of the path marked
"good". That is, for each label interval, an up-dated
likellhood value is determined as the greater likelihood
value between (a) the present likelihood value in the
likelihood envelope and (b) the likelihood value asso-
ciated wlth word path marked "good"; This is illustrated
by s~eps 5064 and 5066. After the envelope is up-dated,
a longest best unmarked live word path is again found
(step 5058).
The loop is then repeated until no unmarked word paths
remain. At that time, the shortest word path marked
"good" is seleGted. If there is more than one word "good"
path having a shortest length, the one having the high-
est Iikelihood at the boundary label thereof is selected
(step 5070). The selected shortest path is then sub-
~ected to extension. That is, at ieast one likely fol-
lower word is determined as indicated above by
preferably parforming the fast match, language model,
0 detailed match, and language model procedure. For each
likely follower word, an extended word path is formed.
Specifically, an extended word path is for~ed by ap-

lZ~62;Z ~
Y0984-021
-62-
pending a likely follower word on the end of the selected
shortest word path.
After the selected shortest word path is formed into
extended word paths, the selected word path is removed
from the stack in which it wss an entry and each extended
word path is entered into ths appropriate stack thsre-
for. In particular, an extended word path becomes an
0 entry into the stack corresponding to the boundary label
of the extended word path step 5072.
With regard to step 5072, the action of extending the
chosen path is discussed with reference to the flowchart
~ of FIG.22. After the path is found in step 5070, the
L5 following procedure is performed whereby a wort path ox
paths are extended based on an appropriate approximate
match.
At step 6000, the acoustic processor 1002 (of FIG.l)
generates a string of labels as described hereinabove.
!0 The string of labels is provided as input to enable step
6002 to be performed. In step 6002 the basic or one of
the enhanced approximate matching procedures is per-
formed to obtain an ordered list of candidate words ac-
cording to the teachings outlined hereinabove.
~5 Thereafter, a language model (as described hereinabove)
is applied in step 6004 as described hereinabove. The
subject words remaining after the language model is ap-
plied are entered together with the generated labels in
a detailed match processor which performs step 6006. The
detailed match results in a list of remaining candidate
-

124~g
Y0~84-021
-63-
words which are preferably subjected to the language
model in step 6008. The !ikely words --as determined by
the approximate match, detailed match, and language
model are used for extension of the path found in step
5070 of FIG.21. Each of the likely words determined at
step 6008 (FIG.22) are separately appended to the found
word path so that a plurality of extended word paths may
be formed.
Refer~ing again to FIG.21, after the extended paths are
formed and the stacks are re-formed, the process repeats
by returning to step 5052.
Each iteration thus consists of selecting the shortest
best "good" word path and extending it. A word path
marked "bad" on one iteration may become "good" on a
later iteration. The characterization of a live word
path as "good" or "bad" is thus made independently on
each iteration. In practice, the likelihood envelope
does not change greatly from one iteration to the next
and the computation to decide whether a word path is
"good" or "bad" is done efficiently. Moreover, normal-
ization is not required.
When complete sentences are identified, step 5074 is
preferably included. That is, when no live word paths
remain unmarked and there are no "good" word paths to
be extended, decoding is finished. The complete word
path having the highest likelihood at the respective
boundary label thereof is identified as the most likely
word sequence for the input label string.

~2~62~9
YO984-021
-64-
.
In the case of continuous speech whers sentence endings
are not identified, path extension proceeds continually
or for a predefined number of words as preferred by the
system user.
K. Constructing Phonetic ~aseforms
.
One type of Markov model phone machine which can be used
in forming baseforms is based on phonetics. That is,
3 each phone machine corresponds to a given phonetic
sound, such as those included in the International Pho-
netic Alphabet.
For a given word, there is a sequence of phonetic sounds
each having a respective phone machins corresponding
S thereto. Each phone machine includes a number of states
Md a number of transitions between states, some of
`~ which can produce a feneme output and some (referred to
as null transitions) which cannot. Statistics relating
to each phone machine --as noted hereinabove-- include
(a) the probability of a given transition occurring and
(b) the likelihood of a particular feneme being produced
at a given transition. Preferably, at each non-null
transition there is some probability associated with
`~ each feneme. In a feneme alphabet shown in Table 1, there
; are preferably 200 fenemes. A phone machine used in
forming phonetic baseforms is illustrated in FIG.3. A
sequence of such phone machines is provided for each
word. The statistics, or probabilities, are entered into

1246~
Y0984-021
-65^
~he phone machines during a training phase in wh.ich
known wor~s are uttered. Transition probabilities and
feneme probabilities in the various phonetic phone ma-
chines are determined during trainin~ by noting the
fene~e string(s) generated when a known phonetic sou~d
is uttered at lPast once and by applying the well-known
forward^backward algorithm.
LO A sample of statistics for one phone identified as phone
DH are set forth in Table 2 As an approximation, the
label output probability distribu~ion for tra~sitions
trl, tr2, and tr8 of the phone machine of FIG.3 are re-
presented by a single distribution; transitions tr3,
IS trh, trS, and tr9 are represented by a single distrib-
ution; and transitions tr6, tr7, and trlO are repres-
ented by a single distribution. This is shown in Table
2 by the assignment of arcs (i.e. transitions) to the
respective columns 4, 5, or 6. Table 2 shows the prob-
~0 ability of each transition and the probability of a la-
bel (i.e. feneme) being generated in the beginning,
middle, or end, respectively, of the phone DH. For the
DH phone, fos example, the probability of the transition
from state Sl to state S2 is counted as .07243. The
probability of transition from state Sl to state S4 is
.92757. (In that these are the only two possible tran-
sitions from the initial state, their sum equals unity.)
As to label output probabilities, the DH phone has a .091
probability of producing the feneme AE13 tsee Table 1)
at the end portion of the phone, i.e. column 6 of Table
2. Also in Table 2 there is a cou~t associated with each
node tor state). The node count is indicative of the

~2~z~
YOg84-021
-66-
number of tim_s during the training that the phone was
in the correspondirg state. Statistics as in Table 2 are
found for each phoneme machine.
The arranging of phonetic phone m~chines into a word
baseform sequence is typically pPrformed by a
phonetici n and is normally not not done automatically.
L. Constructing Fenemic ~aseforms
FIG.23 shows an embodiment of a fenemic phone. The
fenemic phone has two states and three transitions. A
non-null transition is indicated with dashed lines and
represents a path from state 1 to state 2 in which no
label can be produced. A self-loop transition at state
1 permits any number of labels to be produced thereat.
A non-null transition between state 1 and state 2 is
permitted to have a label produced thereat. The proba-
bilities associated with each transitson and with each
label at a transition are determined during a training
session in a manner similar to that previously described
with reference to Phonetic-type baseforms.
Fenemic word baseforms are constructed by concatenating
fenemic phones. One app~oach is described in the co-
pending application entitled "Feneme-based Markov Models
for Words". Preferably, the fenemic word baseforms are
grown from multiple utterances of the corresponding
word. This is described in a co-pending and commonly
assigned application entitled ''constructlng Markov Mod-
els of Words from Multiple Utterances", Canadian Patent
Application No. 504,801, filed March 24, 1986.

Y0984-021
, .
-67-
:~LZ~6~2~3
Briefly, one method of growing baseforms from multiple
utterances includes the steps of:
(a) transforming multiple utterances of the
word segment into respective strings of
fenemes;
(b) defining a set of fenemic Markov model
phone machines;
(c) determining the best single phone machine
P1 for producing the multiple feneme strings;
(d) determining the best two phone baseform
of the form P1P2 or P2P1 for producing the
multiple feneme strings;
(e) align the best two phone baseform against
each feneme string;
(f) splitting each feneme string into a left
portion and a right portion with the left
portion corresponding to the first phone ma-
chine of the two phone baseform and the right
portion corresponding to the second phone ma-
chine of the two phone baseform;

Y0984-021
-68-
(g) identifying each left portion as a left
substring and each right p~rtion as a right
; substring;
(h) processing the set of left substrings in
the same manner as the set of feneme strings
corresponding to the multiple utterances in-
cluding the further step of inhibiting further
3 splitting of a substring when the single phone
baseform thereof has a higher probability of
producing the substring than does the best two
phone baseform;
(j) processing the set of right substrings in
the ssme manner as the set of feneme strings
corresponding to the multiple utterances, in-
cluding tha further step of inhibiting further
splitting of a substring when the single phone
baseform thereof has a higher probability of
0 producing the substring than does the best two
phone baseform; and
,
(k) concatenating the unsplit single phones
in an order corresponding the order of the
feneme substrings to which they correspond.
The number of model elements is typically approximately
the number of fenemes obtained for an utterance of the
word. The baseform models are then trained (or filled
with statistics) by speaking known utterances which into
an acoustic processor that generates a string of label

L622g3
Y0984-021
-69-
in response thereto. Based on the known utteranres and
the generated labels, the statistics for th word models
are derived according to the well known forward-backward
algorithm discussed in articles referred to hereinabove.
In FIG.24 a lattice corresponding to fene~ic phones is
illustrated. The lattice is significantly simpler than
the lattice of FIG.ll relating to a phonetic detailed
match. As noted hereinabove, a fenemic detailed match
proceeding one time interval at a time through the
lattice of FIG.24 is set forth in Appendix 1.
(II) Selecting LikelY Words From a Vocabular~ ~y Means
of Pollin~
Referring to FIG.25, a flowchart 8000 of one embodiment
of the present invention is illustrated. As depicted in
FIG.25, a vocabulary of words is initially prescribed
in step 8002. The words may relate to standard office
correspondence words or technical words, depending on
the user. There have been on the order of 5000 words
or mora in the vocabulary, although the number of words
may vary.
Each word is represented by a sequence of Markov model
phone machines in sccordance with the teachings of Sec-
tion (I)(K) or (I)(L). That is, each word may be re-
presented as a constructed baseform of sequential
phonetic phone machines or a constructed baseform of
sequential fenemic phone machines.

Y09~4-021 ~2~62~
-70-
~.
A "vote" is then determined for each label for each word
in step 8006. The vote determining step 8006 is de-
scribed with r2ference to FIGS.25, 26, 27, 28, and 29.
Fig.26 shows a graph of the distribution of acoustic
labels for a given phone machine Pp~ The counts indl-
cated are extracted from ~he statistics generated during
training. During training, it is recalled, known utter-
ances corresponding to known phone sequences are spoken
and a string of labels generated in response thereto.
The number of times each label is produced when a known
phone is uttered is thus provided during the session.
For each phone, a distribution as in Fig.26 is gener-
ated.
In addition to extracting the information included in
Fig.26 from the training data, the expected number of
labels for a given phone is also derived from the
training data. That i5, when a known utterance corre-
sponding to a given phone is spoken, the number of labels
generated is noted. Each time the known utterance is
spoken, a number of labels for the given phone is noted.
From this infor~ation, the most likely or expected num-
ber of labels for the given phone is defined. FIG.27 is
a graph showing the expected number of labels for each
phone. If the phones correspond to fenemic phones, the
expected number of labels for each phone should typi-
cally average about one. For phonetic phones, the number
of labels may valy greatly.

3~2~
Y0984-OZl
-71-
rne extraction of information in the graphs from train
ing data is achieved by using information from the
forward-backward algorithm described in detail in Ap-
pendix II of the article "Continuous Speech Recognition
by Statistical Methods". Briefly, the forward backward
algorithm involves determining the probability of each
transition between a state i and a state (i+l) in a phone
O by (a) looking forward from the initial state of a Markov
model to the state i and determining the statistics of
getting to the state i in a "forward pass" and (b)
looking backward from the final state of a Markov model
to the state ~i+l) and determining the statistics of
S getting to the final state from state (i+l) in a "back-
ward pass". The probability of the transition from state
i to state (i+l) --given state i-- and the label outputs
thereat are coupled to the other statistics in deter-
mining the probability of a subject transition occurring
O given a certain string of labels. In that Appendix II
of the above-noted article sets forth in detail the
mathematics and application of the algorithm, further
description is not provided.
Each word is known to be a predefined sequence of phones
i as illustrated in FIG.28 with reference to a WORD 1 and
a WORD 2. Given the phone sequence for each word and the
informAtion discussed relative to FIGS. 25 and 26, a
determination can be made as to how many times a given
label is most likely to occur for a particular subject
3 word W. For a word such as WORD 1, the number of times
label 1 is expected may be computed as the number of
counts of label 1 for phone Pl plus the number of counts

-
Y0984-021
-72-
of label 1 for phone P3 plus the number of counts of
label 1 for phone P6 and so on. Similarly, for the word
W072D 1, the number of times label 2 is expected may be
computed as the number of counts of label 1 for phone
Pl plus the number of counts of label 2 for phone P3 and
so on. The expected count of each label for WO~D 1 is
evaluated by performing the above steps for each of the
two hundred labels. A count for a particular label may
represent the number of occurrences of the particular
label divided by the total number of labels generated
during the training phase or may, alternatively, corre-
spond to just the number of occurrences of the label
during training.
In FIG.29 the expected count for each label in a par-
ticular word (e.g. WORD 1) is set forth.
From the expected label counts shown in FIG.29 for a
given word, a "vote" of each label for the given word
is evaluated. The vote of a label L7 for a given word
W' represents the likeliho~d of the word W' producing
the label L'. The vote preferably corresponds to the
logarithmic probability of word W' producing L'. Pref-
; erably, the vote is determined by
vo~e= loglO~Pr(L'¦W )}
The votes are stored in a table as shown in FIG.30. Foreach word 1 through W, each label has an associated vote
identified as V with a double subscript. The first ele-


ZZ~
~0984-021
-73-
ment in the subscript corresponds to the label and the
second to the word. V12 is therefore the vote of label
1 for word 2.
Referring again to FIG.25, the process of selecting
likely candidate words from a vocabulary through polling
is shown to include a step 8008 of generating labels in
response to an unknown spoken input. This is performed
0 by the acoustic processor 1004 (of FIG.l).
The generated labels are looked up in the table of ~IG.30
for a subject wort. The vote of each generated label for
the subject word is retrieved. The votes are then accu-
mulated to give a total vote for the subject word (step
8010). By way of example, if labels 1, 3, and S are
generated, votes Vll~ V31 and V51 would be evaluated and
combined. If the votes are logarithmic probabilities,
they are summed to provide the total vote for word 1. A
similar procedure is followed for each word of the words
0 in the vocabulary, so that labels 1, 3, and S vote for
each word.
In accordance with one embodiment of the invention, the
accumulated vote for each word serves as the likelihood
score for the word. The n words (where n is a predefined
~S integer) having tha highest accu~ulated vote are defined
~` as candidate words that are to be later processed in a
detailed match and language model as outlined above.
In another embodiment, word "penalties" are evaluated
as well as votes. That is, for each word, a penalty is

~2~62Z~
Y0984-021
-74-
determined and assigned (step 8012). ~e penalty re-
presents the likelihood that a subject label is not
~S produced by a given word. There are various methods for
determining the penalty. One approach for determining
the penalt~ for a word represented by a fenemic baseform
involves assuming that each fenemic phone produces only
one label. For a given label and a subject fenemic phone,
0 the penalty for the given label corresponds to the log
probability of any other label being produced by the
subject fenemic phone. The penalty of label 1 for phone
Pz then corresponds to the log probability of any label
2 through 200 being the one produced label. The assump-
S tion of one label output per fenemic phone, although not
a correct one, has proved satisfactory in evaluating
penalties. Once a penalty of a label for each phone is
determined, the penalty for a word constituting a se-
quence of known phones is readily determined.
O The penalty of each label for each word is shown in
FIG.31. Each penalty is identified as PEN followed by
two subscripts, the first representing the label and the
second representing the word.
Referring again to FIG.25, the labels generated in step
8008 are examined to see which labels of the label al-
phabet have not been generated. The penalty for each
label not generated is evaluated for each word. To ob-
tain the total penalty for a given word, the penalty of
each ungenerated label for the given word is retrieved
O and all such penalties are accumulated (step 8014). I~
each penalty corresponds to a logarithmic "~ull" proba-

.~2~$~
Y0984-021
.
-75-
bility, the penalties for the given word are summed over
all labels, as were the votes. The above procedure is
repeated for each word in the vocabulary so that each
word has a total vote and a total penalty given the
string of generated labels.
When a total vote and a total penalty is derived for each
vocabulary word, a likelihood score is defined by com-
D bining the two values (step 8016).If desired the total
vote may be weighted to be greater than the total pen-
alty, or vice versa.
Moreover, the likelihood score for each word is prefer-
ably scaled based on the length of the number of labels
; that are voting (step 8018). Specifically, a~ter the
total vote and the total penalty --both of which which
represent sums of logarithmic probabilities-- are added
together, the final sum is divided by the number of
speech labels generated and considered in computing the
votes and penalties. The result is a scaled likelihood
score.
further aspect of ~he invention relates to determining
which labels in a string to consider in the voting and
penalty (i.e. polling) computations. Where the énd of a
; word is identified and the labels corresponding thereto
are known, preferably all labels generated between a
known start time and the known end time are considered.
However, when the end time is found to be not known (in
step 8020), th~ present invention provides the following
methodology. A reference end time is defined and a

12~ 9
Y0984-021
-75-
likelihood score is ev~luated repeatedly after the ref-
erence erl ~ime at successive time intervals (step
j 8022). For example, after 500ms the ~scaled) likelihood
score for each word is evaluated at 50ms intervals
thereafter up to lOOOms from the start time of the word
utterance. In the example, each word will have ten
(scaled) likelihood scores.
) A conservative approach is adopted in selecting which
of the ten likelihood scores should be assigned to a
given word. Specifically, for the series of likelihood
scores obtained for a given word, the likelihood score
that is highest relative to the likelihood scores of
i other words obtained at the same time interval is se-
lected (step 8024). This highest likelihood score is
then subtracted from all likelihood scores at that time
interval. In this regard, the word having the highest
likelihood score at a given interval is set at zero with
the likeIihood scores of the other less likely words
having negative values. The least negative likelihood
score for a given word is assigned thereto as the highest
relative likelihood score for the word.
When likelihood scores are assigned to each word! the n
words having the highest assigned likelihood sccres are
selected as candidate words resulting from polling (step
.
8026).
In one embodiment of the invention, the n words result-
ing from the polling are provided as a reduced list of
0 words that are then subjected to processing according
.

~2~62Z5~
Y0984-021
-77-
to the detailed match an~ language model described
above. The reduced list obtained by polling, in this
embodiment, acts in place of the acoustic fast match
outlined above. In this regard, it is observed tha~ the
acoustic fast match provides a tree-like lattice struc-
ture in which word baseforms are entered as sequential
phones, wherein words having the same initial phones
LO follow a common branch along the tree structure. For a
2000 word vocabulary, the polling method has been found
to be two to three times faster than the fast acoustic
match which includes the tree-like lattice structure.
Alternatively, however, the acoustic fast match and the
L5 polling may be used in conjunction. That is, from the
trained Markov models and the generated string of la-
bels, the approximate fast acoustic match is performed
in step 8028 in parallel with the polling. One list is
provided by the acoustic match and one list is provided
ZO by the polling. In a conservative approach, the entries
on one list are used in augmentin~ the o~her list. In
.,.~ .
an approach which seeks to further reduce the number of
best candidate words, only words appearing in both lists
are retained for further processing. The interaction of
the two techniques in step 8030 depends on the accuracy
and computational goals of the system. As yet another
alternative, the lattice-style acoustic fast match may
,~ ., .
be applied to the polling list sequentially.
Apparatus 8100 for performing ~he polling is set forth
in FIG.32. Element 8102 stores word models that have
been trained as discussed hereinabove. From the statis-

~2~6~g
Y0984-021
-78-
tics applied to the word models, a vote gensrat.c,r 8104
evaluates the vote of each label for each word and stores
i the votes in vote table storage 8106.
Similarly, a penalty generator 8108 evaluates penalties
of each label for each word in the vocabulary and enters
the values into penalty table storage 8110.
A word likelihood score evaluator 8112 recaives labels
) generated by an acoustic processor 8114 in response to
an unknown spoken input. For a given word selected by
a word select element 8116, the word likelihood score
evaluator 8112 combines the votes of each generated la-
bel for the selected word together with the penalties
j of each label not generated. The likelihood score eval-
uator 8112 includes means for scaling the likelihood
score as discussed hereinabove. The likelihood score
evaluator may also, but need not, include means for re-
peating the score evaluation at successive time inter-
vals following a reference time.
The likelihood score evaluator 8112 provides word scores
to a word lister 8120 which orders the worts according
to assigned likelihood score.
In an embodiment which combines the word list derived
~5 by polling with the list derived by approximate acoustic
matching, the list comparer 8122 is provided. The list
comparer receives as input the polling list from the
word lister 8120 and from the acoustic fast match (which
is described in several embodiments hereinabove).

`"` ~2~Çi.Z2~
Y0~84-021
-79-
,
To reduce storage and computational requiremen~s, se-
veral features may be included. First, the votes and
penalties can be formatted as integers ranging betweenG and 255. Second, the actual penalties can be replaced
by approximate penalties computed from the corresponding
vote as PEN= a.vote+b where a,b are constants and can
be determined by least squares regression. Third, labels
3 can be grouped together into speech classes where each
i class includes at least one label. The allocation of
I labels to classes can be determined by hierarchically
clustering the labels so as to maximize the resulting
', mutual information between speech classes and words.
It should be further noted that, in accordance with the
invention, periods of silence are detected (by known
methods) and are ignored.
The present invention has been implemented in PL/I on
an IBM MVS System, but may be implemented in other pro-
grammin8 languages and on other systems given the
teachings set forth herein.
i While the invention has been described with reference
to a preferred embodiment thereof, it will be understood
by those skilled in the art that various changes in form
;~ and details may be made without departing from the scope
of the inventlon.
For example, the end time for a word may alternatively
be defined by s = in8 the number of expected labels for

~246~2~
Y0984-021
-80-
.
each phone in the word basefor~. In addition, the votes
and penalties may be determined for selected 8enerated
labels ^-such as eve,ry odd label or just the first m
labels in the string-- although it is preferred tha~
votes ant penalties for each label between the start and
end of 8 word be considered.
Moreover, the present invention contemplatés the use of
0 various voting formulas other than the summing of loga-
rithmic probabilities. The invention generslly applies
to a polling apparatus and a polling method for obtai~-
ing a short list of candidate words wherein each label
casts a vote for each word in the vocabulary and that
i thls voto typic~lly v~ri~s from word to ~ord.

YO984-021
-81-
.
- TA~LE 1
T~E TWO LETTEP~S ~OUGJ~LY RE2RES~T THE ~CUN~ OF T~E
ELEMENT.
T~'O DIGITS ARE A~SOCI~TED ~IT~ VO~ELS:
. FIRST: ST~ESS OF SOUND
SECOND: CURRE~;T IDE~ITIFIC~TIO~ NU1~3~R
OME DIGIT ON~Y IS ASSOCIATED IJITH CONSO;I~TS:
SI~IGLE DIGIT: CURRE~JT IDENTIFICATIO~ MU~IP,~.
001 AAll. 029 ~X2- 057 E~02 148 TXS- 176 XXll
002 P.A12 030 BX3- 058 EHll 149 TX6 - 177 ~Y12
003 AA13 031 BX4- 059 EH12 150,~U~01 178 XX13
004 AA14 032 BX5- 060 E~13 151 U~02 179 XX14
005 ~AlS 033 ~X6- n61 Æ~14 152 UHll la0 XXIS
006 AEll 034 BX7- 062 EHlS 153 UR12 181 XV16
1 15 007 ~Æ12 035 BX8- 126 RXl- 154 ~'H13 182 XX17
~ n03 AE13 036 8~9- 127 SRl- lS5 UH14 183 XX18
:~l 009 AE14 037 DHl- 1~8 S~2- 156 UUll 184 XXl9
010 AF15 038 DH2- 129 SYl- 157 UU12 185 ~2-
011 A~ill 039 DQl- 130 SX2- 158 U~ l 186 XX20
~12 Al112 040 D~2- 131 5~3- lS9 UXG2 137 XX21
013 At113 041 DQ3- 13Z SY4- 160'UY11 188 X.Y22
014 AXll a42 DQ4- 133 SX~- 161 UX12 189 XX23
015 AX12 043 DXl- 134 SX6- 162 UX13 190 XX24
016 ~X13 044 DX2- 135 SX7- 163 VXl- 191 X3-
017 AXI4 045 EE01 136 T~l- 164 VX2- 192 XX4-
018 AXlS 046 EE02 137 TH2- 165 ~J:t3- 193 XVS-
;~ 019 ~X16 047 EEll 138 TH3- 166 V:;4- 194 XX6-
: n20 AX17 04a EE12 13a TH4- 167 ~1- 195 XX7-
'~ 021 BQl- 049 EE13 140 T115- 168 ~lX2- 196 XX8-
022 BO2- 050 Fr.l~. 141 TQl- 169 S~X3- 1~7 '~ 9-
023 BQ3- 051 EE15 142 T~2- 170 I'Y4- 198 ZXl-
i 024 8Q4- 052 EEl~ 143 TX3- 171 ~lXS- 199 ZX2-
025 BXl- 053 EE17 144 T~l- 172 WX6- 200 ZX3-
026 8X10 054 EE18 145 TXZ- 173 WX7-
35 027 B~ll 055 EEl9 146 TX3- 174 VXl-
028 BX12 056 E801 14' TX4- 175 X~10
,

~6~Z~
YO984-021
--82--
. .
TABLE 2
~HOHE 3 3H 7 N03ES. 13 AQtS, 3 ARC LAaELS.
CgU`lT31.d 97 I 1l9~l115l4 t20l3
ARd3L 3 ;2757 0,00000 0.;92550.007410.;3982 o.oOola o ;sl7s o 24825 0 ;438
LA3EL5 -~ 65 6 -~ 6 6 -~ 6
~R03O. Z561 IO .75370O.24610
L~3E! 4 S 6
COUtJT120.9 146.4 121.6
BX 100.030 091
. ax~0.130
BX~ 0.011 0.086
DHI - 0.020 0 040 0.013
OQZ- 0.011 0 dS2
E~oT 0~010 0.414 o.o66
E H l l
rX2 0 04 0.02
.-X3- 0 14~3
GX2- 0.013
GXS- O.148
HX I -0.246o ~ d ~2 ~3
X5~ o ol3 3:324
; .x~-3 3~9 O.Ot2
';X 0.017 g.O12
T~l - 0-4Z9 0.018 .023
,J2 0 020
Uj~2 0.025 0.082 0 109
X 20 o 243g oZ 81 ~ 0 916 . !,
~ O.072
OTHER0.073 ~ 047 0.048
__________ _________________, _________~______ ____ _______ __ _____________..____ ___ __ ___________ ______c~___

~z~z~
Y0984-021 -83-
APPENDIX l
Subroutine EXTCOL.
- .,
Extends the column - ~
..
Re~ister Usage:
- . ,-
SP: , l, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
DPX: -4, -3, -2, -1, 0 ~DPA = O)
DPY: -4, -3, -2, -1, O, 1, 2, 3 (DPA = O)
Restore the registers from main memory.
EXTCOL: LDMA; DB = SFENPTR; WRTLMN get feneme list pointer
LDMA; DB = STLPPTR, WRTLMN get tail prob list pointer
LDMA; DB = SSTROFF; WRTLMN get start offset
LDSPI FENPTR; DB ' MD , save feneme list pointer
LDSPI TLPPTR; DB < MD save tail prob list pointer
LDSPI STROFF; DB ~ MD save start offset
LDMA; D8 ~ SST~ROW; WRTLMN get start row
LD;iA; DB = SGUTOFF; WRTLMN get output offset
LDMA; DB = SIN3NDY; URT~IN gct input boundar~ pointer
LDSPI STRROW; DB ' ~ID sa~-e start row
LDSPI OUTOFF; DB ' MD save output o~fse.t
LDSPI INBNDY; DB ' MD save input boundary pointer
LDMA; DB = STIME; WRT~IN get the current time
LDMA; DB = SSTRLEN; WRTL~ get start distribution length
LDMA; DB = ALEXLEX; WRTLMN get lexeme length
LDSPI TIME; D8 ' MD save current time
LDSPI STRLEN; Da < MD save start distribution length
LDSPI LOOPCNT; DB ' MD save lexeme length
SUB STRROW, LOOPCNT loop count = lexlen - strrow
LDSPI PRMADR; DB = SFENPTR put back feneme pointer + 1
INC FENPTR; DPX(-3) ' SPFN
MOV PRMADR, PRMADR; SE~; MI ' DPX(-3)
` LDSPI PRMADR; D~ = STLPPTR put back tail pointer + 1
INC TLPPTR; DPX(-3) ' SPFN
MOV PRMADR, PRMADR; SETMA; MI < DPX(-3)
LDSPI PRMADR; DB - SINBNDY put back inbndy + 1
INC INBNDY; DPXt-3) ' SPFN
MOV PXMADR, PRMADR; SETMA; MI < DPX(-3)
LDSPI PRMADR; DB = STIME pu~ back time + l
INC TIME; DPX(-3) ~ SPFN
'

62~9
Y0984-021 -84~
MOV PRMADR, PRMADR; SETMA; MI < DPX(-3)
.................................... ,
" Get the next feneme and tail probability. "
OUT; DB = PAGEO "flip to page zero
MOV FENPTR, FENPTR; SETMA "~et next feneme
MOV TLPPTR, TLPPTR; SETMA "get next tail probability
OUT; DB = PAGEl "flip back to page one
LDSPI FENEME; DB < MD "save the feneme
DPY(TPY) < MD "save the tail probability
MOV FENEME, FENEME; DPX(CFENX) < SPFN
LDSPI FENLOK; DB = AFENLOK "get base addr of feneme lookup
ADD FENEME, FENLOK; SETMA "use feneme to get fenbas
LDSPI INCOL; DB = ACOLUMN "get base addr for input col
LDSPI OUTCOL; DB = ACOLUMN "get base addr for output col
LDSPI FENBAS; DB ~ MD "pointer into feneme probs
ADD STROFF, INCOL "incol = start offset +
ADD STRPLOW, INCOL " start row
ADD OUTOFF, OUTCOL "outcol = output offset +
ADD STR~OW, OUTCOL " start row - 2;
LDSPI TRMLOK; DB = ATRMLOK "get base addr of tra~ lookup
ADD STRROW, TRMLOK; SE~IA "lookup the tram base ptr
DEC OUTCOL "outcol.... - l
DEC WTCOL "outcol.... - 2
LDSPI TRMPTR; D~ < MD "~et stnLting tram addr
LDSPI PR~IAnR; DB = !FFTSZ "comp~nsate for tr~m base
ADD PR~IADR, PRMADR "which is at 2 x ~FFTS2
ADD PRMADR, TR~IPTR
" Start the pipeline going........................................ "
................. ,... ,.. -----.. -------------------.-.. ,.. ,.. ".,... ,... -.. --
MOV INCOL, INCOL; SETMA "get col(start ptr)
`~' MOV INBNDY, INBNDY;!SETMA "get inbndy(time) in case
SUB# TIME, STRLEN "start length - time
`~ BLT OUTSI~E; "if <= we are outside inbndy
DPY(LIY) C MD "and save col(start ptr)
INSIDE: FADD DPY(LIY), MD "col() + inbndy(time)
FADD "push the adder
DPY(LIY) CFA "last input = col + inbndy
~ OUTSIDE: DPY(PROFY) < ZERO "zero out the running profile
i " Start the pipeline going... start first loop
MOV TRMPTP, TRMPTR; SETTMA "start transfer of phone num
NOP "wait for table memory
LDSPI PHONE; DB C TM; "save the phone number

- :~2~6~9
Y0984-021 ~~5~
INCTMA "increment tramptr
ADD FEN~AS9 PHONE; SEI?IA "lookup feneme prob
: FADD DPY(LIY), ZERO; "push last state thru adder
INCTMA "start transfer of null trans
FADD; "push the adder
INCTMA; "start transfer of phone num
DPYtSOFAY) < ZERO "clear 2nd oldest farc
....................................... ,.. ,.. ,........ ,.. ,.. ",.. ,.,.. ,.... ,
" Start the pipeline going... do first loop
FMUL TM, FA; "null trans ~ last_input
INC INCOL; SETMA; "start transfer of col()
DPY (LIY) < FA; "save last input
FADD DPY(PROFY), FA; "profile = prof ~ last input
DPX (FENX) < MD
FMUL DPX(FENX), DPY(TPY); "feneme prob * tail prob
LDSPI PHONE; DB < TM; "save the phone number
INCTMA; "start trans of self trans
FADD "push the adder
FMUL;
DPY(PROFY) ~ EA; "savc the running proEile
ADD FENBA,S, PHONE; SET.IA; "loo~.up fencme prob
DPX(FAX) < ZERO; "~ero out forwnrd arc
FADD "null add
FADD FM, MD; "col() + null'last input
FMUL TM, DPX(FAX); "self trans -:~ forward arc
INCTMA; "start transfer of null trans
DPX(OFAX) < DPY(SOFAY) "push queue of forward arcs
~ FADD;
: . FMUL FM, DPY(LIY); "last input ~ feneme prob
INCTMA; "start transfer of phone num
` DPY(SOFAY) < DPX(FAX3 "zerc out 2nd oldest farc
.... ,.".. """".. "... ,.. ".. ,......... ,
" Loop for index = 1 to loopcnt (= lexeme leng~h ^ start row) "
EXTLOOP: FMUL TM, FA; "null trans J- last input
INC INCOL; SETMA; "start transfer of col()
~` DPY (LIY) < FA, "save last input
FADD DPY(PROFY), FA; "profiIe = prof + last input
DPX (FENX) < MD
FMUL DPX(FENX), DPY(TPY); "feneme prob ~ tail prob
DPY(SELFY) < FM; "save self arc
LDSPI PHONE; DB < TM; "save the phone number
INCTMA; "start trans of self trans
FADD push the adder

622~3
Y0984-021 -86-
FMUL;
DPYtPROFY) < FA; "save the running profile
ADD FENBAS, PHONE; SEI~IA; "lookup feneme prob
DPX(FAX) ~ FM; "save forward arc
FADD DPY(SELFY), DPX(OFAX) "self + oldest_farc
FADD FM, ~ID; "col() + null~last_input
FMUL TM, DPXtFAX); "self trans * forward arc
INCI~IA; "start transfer of null trans
DEC LOOPCNT; "decrement the loop count
DPX(OFAX) ~ DPY(SOFAY) "push queue of forward arcs
FADD;
FMUL FM, DPY(LIY); "last_input * feneme prob
INCTMA; "start transfer of phone num
INC OUTCOL; SETMA; put out next output
MI ~ FA;
DPY(SOFAY) ~ DPX(FAX); "push forward arc queue
BGT EXTLOOP "keep looping until done(BNE)
" Trail out of loop............................................... "
FMUL T~, FA; "calculate last forward arc
DPY (LIY) < FA; "save last input
FADD DPY(PROFY), FA "profile = prof + last inp~lt
F?IUL; "pusll the multiplier
DPY(SELFY) < FM; "sa~e sal~ arc
I~CTMA; "start trans of ~olf tr~ns
FADD "push the adder
FMUL;
~ DPY(PROFY) < FA; "save the running profile
:~ DPX(FAX) < FM; "save forward arc
FADD DPY(S~LFY), DPX(OFAX) "self + oldest farc
FADD; "push the adder
~' FMUL TM, DPX(FAX); "self trans ~ forward arc
DPX(OFAX) C DPY(SOFAY) "push queue of forward arcs
. '!.
FMUL; "push the multiplier
INC OUTCOL; SETMA; "put out next output
MI ~ FA;
DPY(SOFAY) C DPX(FAX) "push forward arc queue
" Push last outputs out of loop................................... "
" " " " " " " " " " " " " ", ~
FMUL "push the multiplier
FADD FM, DPX(OFAX) "self + oldest farc
FADD; "push the adder
DPX(OFAX) C DPY(SOFAY) "push queue of forward arcs
INC OUTCOL; SETMA; "put out next output

3~6~29
YO984-021 -87-
MI < FA;
DPY(SOFAY) ~ DPX(FAX) "push forward arc queue
INC OUTCOL; SETMA;
MI < DPX(OFAX) "push out oldest forward arc
RETURN "finished EXTCOL

~2462~9
Y0984-021 -88-
APPENDIX 2
.. ..
" SU~ROUTINE APFM "
.. ..
" This program implements the acoustic Fast Match in the FPS Array "
" Processor. This is the modified Fast Match that runs without "
explicit length distributions.
l l
-- . .
SU~ROUTINL EVALPP l'
" This routine performs the actual fast match calculation for the "" current lattice node. The main program only calls this routine to "
evaluate valid nodes - not the null nodes that correspond to leaves. ''
.. .
" Initializations... given the current lattice node number, look up "
the corresponding clink number, set up match parameters such as
" the length of the start time distribution, pointers to the start "
" time distribution in the boundary stacksl and the offset into the "
" feneme stream. "
"
~' ,. '' ""'' '1.
" Initial zeroes = 4: "
.
Pad the start time distribution with 4 zeroes, increase SDLEN by 4
" to simplify looping after start time distribution ends.
" ..
" Initialize output distribution (time - 1), output sum, set feneme "
; " prob for first time slice equsl to zero by clearing the multiplier. ''
" output d~stribution (O) = 0.0; "
output sum = o.o;
" feneme prob = 0.0;
" state 1 = 0.0;
state_2 = 0.0;
state 3 = 0 0; ~
" state 4 = 0.0; "
.. ..
.
ZER04: ADD# SDARY, SDLEN;"point to last sample in the
SETMA; "start time distribution
DPX(LASTX) ~ ZERO;"zero the last output sample
DPY(OSY) < ZERO "zero the output sum
INCMA; 'last sample + 1
MI < ZERO; "pad out with zero

~29L62Z9
Y0384-021 -89-
DPY(STlY) < ZERO; "state 1 = 0.0
INC SDLEN "sdlen ~ sdlen ~ 1
INCMA; "last sample + 2
MI ~ ZERO; "pad out with zero
; DPY(ST2Y) < ZERO; "state 2 = 0.0
INC SDLEN "sdlen ~ sdlen ~ 2
INCMA; "last sample + 3
MI < ZERO; "pad out r~ith zero
DPY(ST3Y) < ZERO; "state 3 = 0.0
INC SDLEN "sdlen ~ sdlen + 3
INCMA; "last sample + 4
MI < ZERO; "pad out with zero
DPY(ST4Y) < ZERO; "state 4 ~ 0.0
INC SDLEN "sdlen ~ sdlen + 4
CLR TIME; "output time counter = O
FMUL DPX(LASTX), DPY(OSY) "clear the multiplier
MOV SDLEN, LOPLIM; "lst loop limit = sdlen
FMUL "push the multiplier
.. ..
. '' First loop: initial zsroes = 4
,.
Calculate output distribution value for current time, update output
' s~m, calculatc fencme probabili~y for the next time slice. ''
" do time = 1 to start time length + 4; "
" output distribution (time) = "
" feneme_prob -~ (output distribution (time - 1) + state 1); "
" output st~ = output sum + output distribution (time); "
state 1 = state 2 * feneme prob;
' state 2 = state 3 * feneme prob;
" state 3 = state 4 ~ feneme prob; "
" state 4 = st_array ttime);
" feneme prob = fd array ~local buffer(first feneme + time))
" * tail buffer (first feneme + time); "
" end;
.. ..
L41: INC LFARY; "start transfcr of next
SETM~; "feneme symbol from stream
Fh~D DPX(LASTX), DPY(STlY); "add last output + state 1
FMUL "push the multiplier
INC TPARY; "start transfer of next
SETMA; "tail probabilit~
FMUL FM, DPY(ST2Y); "state 2 * feneme_prob
FADD; "push the adder pipeline
DPX(FPX) < FM "save feneme_prob
INC SDARY; "transfer next starting pt
SETMA;
FMUL DPX(FPX), FA "(last+stl) * feneme_prob
.,

12~6ZZ9
Y0984-021 -90-
LDSPI FDOFF; "get the current feneme
DB = MD; "symbol from bus
FMUL DPX(FPX), DPY(ST3Y) "state 3 ~ feneme_prob
ADD# FDOFF, FDARY; "start transfer of next
SETMA; "feneme probability
DPX(TPX) < MD; "save next tail probability
FMUL DPX(FPX), DPY(ST4Y); "state 4 ~ feneme prob
DPY(STlY) < FM "state_l = state 2 * fp
FMUL; "push the multiplier
DPX(LASTX) < FM; "save output sample
DPY(ST4Y) ~ MD "store next input sample
INC TI~E; "update the time counter
FMUL; "push the multiplier
DPY(ST2Y) ~ FM; "state 2 = state 3 ~ fp
FADD DPX(LASTX), DPY(OSY) "add output to output sum
DEC LOPLIM; "at end of loop?
FMUL DPX(TPX~, ~; "feneme * tail prob
DPY(ST3Y) < FM; "state 3 = state 4 * fp
FADD . "pus~ the adder
BGT L41; "keep looping if not done
FMUL; "push the multiplier
DPY(OSY) ~ FA; "save output sum
I~CT~IA; "update pointer to scratGh
DB = DP~(LAST~); "arca ~or o~tput dist, save
OUT "the output sample
,.,.. -,.. ,."",-".. "".. ".. ,.. ,.. ,.. """.,.. ,.",.. -..... "-".. ,.,.-,.. ,.,-.--.-",.. ,.,
ll ll
" Second loop: "
.. .. .
Time is now equal to start time length + initial zeroes, so the
" start time distribution has ended and all internal states are "
equal to zero. Therefore, this section of code is common to all
" cases of initial zeroes. "
" Loop until time limit (start time length + ld length - 1), or until "
" the output falls below loop cutoff. "
"
"
" do time = start time length + l + initial zeroes to time limit "
while (output distribution (time) >= loop cutoff);
ll
" output distribution (time) =
" feneme prob '~ output_distribution (time - l); "
output_sum = output sum + output distribution (time);
feneme prob = fd_array (local_buffer(first_feneme + time))
* tail buffer (first feneme ~ time);
" end; "
~l ,.
EVAL2: MOV LDLEN, LOPLIM loop limit = ld_len - 1

~2~6Z29
Y0984-021 -91-
SUB INTZERO, LOPLIM " - initial_zeroes
DEC LOPLIM
BGT L4Z "if looplimit > O, do it
JMP EVOUT "otherwise, ~ump to exit
L42: INC LFARY; "start transfer of next
SET~IA; "feneme symbol from stream
FMUL - "push the multiplier
INC TPARY; "get next ~ail prob
SETMA; "from buffer
FMUL FM, DPX(LASTX) "last = last ~- feneme prob
FMVL "push the multiplier
LDSPI FDOFF; "get feneme symbol
DB = MD; "from input stream
FMUL "push the multiplier
ADD# FDOFF, FDARY; "use feneme as pointer into
SETMA; "probability array
DPX;TPX) < MD; "save the tail probability
FSUBR FM, DPY(LCY) "com~are output to cutoff
FADD FM, DPY(OSY); "add output into output sum
DPX(LASTX)~FM "save output in register
INC TI~IE; "increment time counter
FADD "push the adder
DEC LOPLIM; "see if we are cone yel:
FilUL DPX(rP~), i~; "feneme : tail probability
BFGE EVOUT; "quit if output < cutoff
DPY(OSY)~FA "save output sum
BGT L4Z; "if not done, keep looping
FMUL; "push the multiplier
INCTMA; "update pointer to scratch
DB = DPX(LASTX); "area for output dist, save
i OUT "the output sample

Representative Drawing

Sorry, the representative drawing for patent document number 1246229 was not found.

Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Inactive: Expired (old Act Patent) latest possible expiry date 2006-03-24
Grant by Issuance 1988-12-06

Abandonment History

There is no abandonment history.

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
INTERNATIONAL BUSINESS MACHINES CORPORATION
Past Owners on Record
LALIT R. BAHL
PETER V. DESOUZA
ROBERT L. MERCER
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column (Temporarily unavailable). To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Abstract 1993-08-24 1 26
Drawings 1993-08-24 26 442
Claims 1993-08-24 12 272
Cover Page 1993-08-24 1 16
Descriptions 1993-08-24 90 2,533