Language selection

Search

Patent 1149961 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent: (11) CA 1149961
(21) Application Number: 368883
(54) English Title: CREATING VISUAL IMAGES OF LIP MOVEMENTS
(54) French Title: SYSTEME DE CREATION D'IMAGES DU MOUVEMENT DES LEVRES
Status: Expired
Bibliographic Data
Abstracts

English Abstract


Abstract of the Disclosure

A system and method that creates visual images of lip move-
ments on film, video tape, or other recorded media. Speech
sounds are analyzed, digitally encoded and transmitted to
a data memory device. Stored within the data memory device
is a program for producing output data that creates
visual images of lip movements corresponding to the speech
sounds. Under control of the data for the speech sounds,
the graphical output from the data memory device is sent to
a graphic output device and related display equipment to
produce the graphical display. This display may be combined
with the speech sounds so that the resultant audio-visual
composite, such as a film strip, contains lip movements
corresponding to the speech sounds.


Claims

Note: Claims are shown in the official language in which they were submitted.



The embodiments of the invention in which an exclusive
property or privilege is claimed are defined as follows:

1. A method of graphically creating lip images which com-
prises providing a phoneme encoded representation of speech
sounds, providing an encoded representation of selected lip
configurations, transmitting both of said encoded represen-
tations to a data memory device for storage therein, and
transmitting from said data memory device to a graphical
output device and under control of a coded speech input
and a coded selected lip configuration, both of which are
transmitted to said data memory device, coded signals that
cause said graphical display device to display lip images
approximately corresponding to the uncoded form of said
coded speech input.

2. A method according to claim 1 in which the coded speech
input selects a phoneme output from the data memory device,
which phoneme output most nearly represents the speech input.

3. A method of creating visual images of lip movements cor-
responding to speech sounds comprising providing a coded
representation of speech sounds that are associated with
lip movements from visual information, which lip movements
do not correspond to the speech sounds, transmitting said
coded representation to a data memory device for storage
therein, storing in said data memory device coded data for
creating a graphical representation of new lip movements
corresponding to the speech sounds, and transmitting from
said data memory device to a graphical output device
and under control of the data representing the coded speech
data for the new lip movements corresponding to said speech
sounds.

4. A method according to claim 3 including extracting the
coded speech sounds from a series of frames of audio and
said visual information, storing said coded speech sounds





in said data memory device on a frame-by-frame basis,
optically scanning said series of frames in sequence to
provide an encoded graphical image of the visual informa-
tion and transmitting the encoded graphical image data to
said data memory device, and transmitting from said data
memory device to said graphical output device the visual
information data and the data for the new lip movements
that replace the first-mentioned lip movements.

5. A method according to claim 4 in which the frames that
are scanned are images from frames on a strip of photo-
graphic film, and the speech sounds are on a sound track
for the film.

6. A method of creating audio-visual media having visual
images and associated speech sounds comprising providing a
coded representation of speech sounds, transmitting said
coded representation to a data memory device for storage
therein, storing in said data memory device coded data for
creating graphical representations corresponding to the
speech sounds,transmitting from said data memory device
to a graphical output device and under control of the data
for the coded speech representation the data for the
graphical representations corresponding to the speech
sounds, and combining the graphical representations with
visual images in a predetermined manner on an audio-
visual recorded medium.

7. Apparatus for creating visual images of lip movements
corresponding to speech sounds comprising means for pro-
viding a coded representation of speech sounds that are
associated with lip movements from visual information, a
data memory device, means for transmitting said coded rep-
resentation to said data memory device for storage therein,
said data memory device having stored therein coded data
for creating a graphical representation of predetermined
lip movements corresponding to the speech sounds, a graphical

11


output device, and means for transmitting to said graphical
output device and under control of the coded speech repre-
sentation data, the data for the predetermined lip move-
ments corresponding to the speech sounds.

8. Apparatus according to claim 7 including means for ex-
tracting the coded speech sounds from a series of frames
of audio and said visual information, means for transmit-
ting said coded speech sounds to said data memory device
on a frame-by-frame basis for storage therein on said
basis, and means for optically scanning said series of
frames in sequence to provide an encoded graphical image
of the visual information.


12

Description

Note: Descriptions are shown in the official language in which they were submitted.


Bloomstein Case 2
il49961




CREATING VISUAL IMAGES
OF LIP MOVEMENTS
.

Back~round of the Invention

This invention relates to systems and methods for creating
visual images responsive to analyzed speech data so as to
produce a graphical representation of known type such as
the lip movements corresponding to the speech sounds. The
invention is particularly suitable for creating visual
images of lip movements in films, video tapes, and on other
recorded media.
In the protuction o~ many types of audio-visual media the
speech sounds and the visual images are recorded simul-
taneously. For example, in the making of motion pictures
or like audio visual recordings, the voice of the actor
is recorded on the sound track at the same time that the
actor is emitting speech sounds. Where the film is intended
to be played as originally produced, the speech sounds of
the sound track correspond to the lip movements emitted.
However9 itfrequently happens that the audio portion or
sound track is to be in a language other than the original
one spoken by the actor. Under such circumstances a new
sound tract in another language is "dubbed in". When this
is done the speech sounds do not correspond to the lip
movements, resulting in an audio-visual presentation that
looks unreal or inferior.

, .




,

~499~i~
-- 2 --

In animated cartoons it is also a problem to provide lip
movements which correspond to the speech soundO This may
be done, however, by utilizing individual art work or
drawings for the lip movements, sometimes as many as
several per second. Because of the necessity of making
numerous drawings by hand or other laborious art tech-
niques, the cost of animated cartoons tends to be expensive.

Summary of the Invention
In accordance with this invention predetermined visual
images such as lip movements are graphically created to
correspond with speech sounds so that when the visual
images and speech sounds are recorded on film, video tape
or other media, the presentation (listening and viewing)
will tend to be more real and interesting. The method and
apparatus of the invention creates graphical images with
a minimum of human effort by the unique utilization of
computerized graphic techniques.
In further accordance with this invention there is pro-
vided a coded representation of speech sounds. These
speech sounds may be associated with lip movements from
visual information, which lip movements do not correspond
to the speech sound as in a "dubbed in" sound track of a
motion picture. The coded representations of the speech
sounds are transmitted to a data memory device (e.gO com-
puter) for storage therein. There is also stored in the
data memory device coded data for creating a predetermined
graphical representation (e.g. lip movements) corresponding
to the speec~ sounds. This coded data or software is
intended to respond to the coded data representing the
speech sound so that the coded speech sounds can instruct
the computer to send out graphical signals of new lip
movements or other graphical representation corresponding
to the speech sounds. The new lip movement data is thus
transmitted to a graphic output device of known type from




~, , ; , , , ~

4~961
-- 3 --

which a suitable graphic display may be created. This
graphic display may be, for example, a video display or
a film frame. The audio or speech portions may be com-
bined in correlation with the graphical display~
When a motion picture film having a "dubbed in" sound
track in which lip movements are not in correspondence
with the speech sounds, the encoding of the lip movements
may be done on a frame-by-frame basis. Thus, the coded
speech sounds may be extracted from the sound track,
frame-by-frame, and sent to the computerO Likewise, the
computer may receive information as to the mouth position
on each frame as well as information relating to the mouth
shape of the actor. The entire film may be optically
scanned on a frame-by-frame basis so that each frame with
mouth location and mouth configuration data may be stored
in the computer in digital form along with data in digital
form as to the analyzed speech sounds. When the informa-
tion i8 sent out from the computer to the graphical output
device, the data for t~e speech sounds causes the computer
to send out the proper graphical output signals to the
graphical output device corresponding to the particular
speech sounds on the sound track of the film. Thus, the
film is reconstructed to the extent necessary to change
the mouth shape and lip movement or configuration to cor-
respond with the speech sound.

Typical apparatus of the present invention for creating
visual images of lip movements comprises means such as a
speech analyzer for providing a coded representation of
speech sounds, a data memory device, means for transmit-
ting ;said coded representation to said data memory device
for storage therein, said data memory device having stored
therein coded data for creating a graphical representation
of lip movements corresponding to the speech sounds, a
graphical output device, and means for transmitting to
said graphical output device and under control of the
.....



.
,~ .

--" 1149961
-- 4 --

coded speech representation data for the lip movements
corresponding to the speech sounds.

The apparatus further includes means for extracting the
5 coded speech sounds from a series of frames of audio and
visual information, means for transmitting said coded speech
sounds to said data memory device on a frame-by-frame
basis for storage therein on that basis. Means are also
provided for optically scanning a series of frames in se-
10 quence to provide an encoded graphical image of the visualinformation. Means are provided for transmitting the en-
coded graphical image data to the computer or data memory
deviceO Means are also provided for transmitting from the
data memory device to the graphical output device the
15 visual information data plus the data for the new lip move-
ments which, in a corrected film, replace the old lip move-
ments on the film.

Brief DescriPtion of the Fi~ures
Fig. 1 is a diagram showing the arrangement for storing of
phoneme and graphic codes and forming part of the present
invention;

25 Fig. 2 is a diagram showing the manner of using the codes
to display visual images;

Fig. 3 is a modified form of the invention showing the en-
coding of lip movement corrections; and
Fig. 4 is a diagram showing an arrangement for graphically
displaying the lip movement corrections encoded by the ar-
rangement of Fig. 3.

35 Detailed Description

Referring now in more detail to the drawing, and particu-
- larly to Fig. 1, there is shown an arrangement for storing

.

1~4996
-- 5 --

phoneme and graphic codes into a digital electronic com-
puter. One set of codes represents the spoken phoneme of
a language (e.g. the English language). The other codes
are graphic codes representing visual images of lips of
various mouth types such as male, female, cartaon animal,
etc. together with orientations of the mouth such as front
view, three quarter view, side view, etc.

More particularly, a person such as an actor pronounces
phoneme into a voice encoder. The voice encoder translates
the phoneme into a digital electronic phoneme code which is
transmitted to a digital computer and stored in its memory.
A phoneme for entire language may be thus be digitally
coded. In conjunction with the phoneme code an artist may
draw one or more mouth shapes. A programmer or electronic
graphic scanning device encodes the artist's drawing into
graphic code, which code is also sent for storage into the
electronic digital computer. The foregoing is repeated
until a complete set of phoneme and graphic codes are
stored in digital form representing the basic phoneme
code, standard mouth types and orientations, etc. as here-
tofore stated.

A phoneme code is a representation of intensities of sound
over a specified series of frequencies. The number of fre-
quencies selected depends upon the degree of refinement of
the code. Typically, three frequencies may be used to ob-
tain three intensities (decibel level), one for each fre-
quency. The English language has sixty-two phonemesO
Thus, each of the s~xty-two phonemes will be coded at
three selected frequencies. A discussion of voice analysis
may be found in the publication Interface A~e, issue of
May 5, 1977, pages 56-67.

Thus, an actor 2, speaking into a microphone 4 transmits
phoneme to a voice encoder 6, which digitally encodes the
phoneme and transmits the encoded data to a data memory
. ,; . .


. .
,, ~ . ,

,, . -. :
. . ,~

114996~
-- 6 --

device 8. This data memory device may be any known type
of electronic digital computer. An example is the model
PDP-ll/40 of Digital Equipment Corporation, Maynard, Mas-
sachusetts. An artist may produce a drawing 10 of a par-
ticular lip or mouth shape. This drawing lO may be graph-
ically encoded by the programmer or electronic graphic
scanning device 12. This unit may be of the type described
in United States Patent 3,728,576 and is basically an opti-
cal scanner which encodes the artist's drawing into a
graphic digital code for transmission to the computer 8.

The voice encoder 6, previously referred to, is sometimes
known as a speech encoder and is a known piece of equip-
ment. Such a device is sold under the trademark SPEECH
LAB and is obtainable from Heuristics, Inc. of Los Altos,
California. The voice encoder 6 is a device which trans-
lates the phoneme into a digital electronic phoneme code.

The artist will draw as many mouth or lip shapes 10 as
may be necessary to encode the computer 8 with a complete
phomeme language code and all of the mouth or lip shapes
which may be needed for subsequent graphical reproduction.

Referring now to Fig. 2, the~e is shown the output or
playback mode of the present invention. A keyboard 13 is
used to select a mouth type (male, female, etc.) orienta-
tion front, three-quarter, side, etc. from among the
previously encoded lip or mouth shapes. The keyboard is
of a known type and may be, for example, a DEC LA 36
DECWRITER II and/or VT 50 DECSCOPE, products of Digital
¦ Equipment Corporation. The keyboard 13 is connected to
the computer 8 so that the mouth type, mouth orientation,
etc. may be chosen by keying in the desired selection.

The actor 2 speaking in the microphone 24 reads a script
or other voice material into the voice encoder 60 which is
similar to the voice encoder 6 previously described. The
!
i

f`
- 7 -

voice encoder 60 translates the actor's voice into a
digital electronic voice codeO The output of the encoder
60 is transmitted to the computer 8. Under control of the
keyed-in signal from the keyboard 13 and of the encoded
output of the voice encoder 60, the data memory device or
computer 8 sends to display device 14 signals corresponding
to the selected graphic code from its memory and also the
phoneme code which most closely matches the actor's encoded
voice. This display device 14 converts the graphic codes
and size information into visual information such as the
lip shape 16 shown. The visual images 16 can be recorded
on film or other audio/visual media. For example, visual
images may be enlarged into framed transparencies for over-
lay into compounded frames.
Thus, the playback mode of the present arrangement shown in
Fig. 2 allows a simple selection of mouth orientation and
related mouth characteristics to be simply keyed into the
computer which has the various mouth information stored
therein. At the same time the voice of the actor 2 may
be encoded to provide an input signal to the computer caus-
ing it to produce a phoneme output most nearly in accord-
ance with the coded signalsO As a result, the output from
the computer 8 to the graphic display 14 is controlled by
the keyboard input from the keyboard 13 and the output
from the voice encoder 60, the latter of which determines
the lip configuration shown in the graphic display 16.
Thus, if the actor pronounces an "ah" sound into the
microphone 24, the coded input signal to the computer 8
will find or select the nearest phoneme code in accordance
with known data comparison techniques. This code will then
be used to provide a predetermined output to display device
14 that will result in an "ah" shaped lip configuration in
display 16.
The graphic display device 14 is itself a known item and
may, for example, be a RAMTEK G-100-A color display system,
sold by Ramtek Corporation of Sunnydale, California.



,

~14~
-- 8 --

It is possible to overlay directly the constructed visual
image 16 onto an existing film or other audio/visual medium
automatically. In such procedure the original film is
converted by an electronic graphic scanning device, such
as is shown at 12 in Fig. 1 and previously described,
into what is known as "pixels". These are electronic
digital codes representing the light intensity at a large
number of points on the screen. The "pixels" are analyzed
by an electronic digital computer by various algorithms
to determine the size, orientation and/or location of fea-
tures (in this case the mouth). The pixels in the local
region of the located mouth can be replaced in the elec-
tronic digital computer memory by existing computer instruc-
tions of graphic codes from the sets of phoneme graphic
codes stored previously therein and selected by the ar-
rangement shown and described with respect to Fig. 2. The
resulting pixels representing the original frame with
mouth replaced can be sent to an electronic graphic display
device for display and recording.
Fig. 3 and Fig. 4 show a modified form of the invention
which may be used for correcting the lip movements in mo-
tion picture film. Fig. 3 shows a motion picture film 20
having a series of frames 22, 24 etc. that include a
visual image 25 and a sound track 26. The sound of the
sound track may be a foreign language dubbed in, result-
ing in a sound which does not correspond to the lip move-
ments in the various frames. Accordingly, the film 20 may
be run through a sound projector 28 that embodies a frame
counter that sends frame count output pulses over conductor
30 to the digital memory device or computer 8. The sound
projector 28 also projects an image 32 on a suitable
screen. This screen may be a so-called inter-active
graphic tablet. A stylus 34 is used in a known fashion to
select the mouth position relative to coordinates on the
graphic tablet 32. The stylus 34 records the position of
the mouth as a digital code and in accordance with known



, ~ ~
,

,
~ ' ~

~149961

techniques transmits the information over a conductor 36
for storage into the computer 80 If needed, a keyboard 40
is also utilized whereby data representing a mouth type or
other configuration may be transmitted to the computer 8.

An encoder 6a may also be used and is of the type similar
to the encader 6 previously described. This encoder trans-
mits the digital phoneme into the computer 8. Further-
more, the output sound from the projector as an electrical
signal is transmitted over conductor 42 to the encoder 6a,
such electrical signal representing the output sound from
the sound track 26.

Thus, the digital computer 8 has stored therein considerable
data in coded form. This data consists of the frame counts,
the mouth p~sition, the phoneme code, and the mouth type.

Turning now to Fig. 4, the playback or output arrangement is
shown. The images 25 of frames 22, 24 etc. are scanned by
a conventional optical s~canner 50 which sends a digitally
coded image for each frame over conductor 52 to the com-
puter 8. At the same time a pulse advance is supplied over
conductor 54 to advance the frame of the film. The output
signal from the digital computer 8 is sent to a graphic
output device 56 which provides a graphic display 58 that
has the new lip movements thereon. Thus, the arrangement
provides for the encoding of the sound from the sound
track 26 and utilizing that data to create a new lip con-
figuration corresponding to the sound of the sound track.
The graphic display 58 may be recombined with the sound
in the orm of a new film,videotape, or the likeO




"



Representative Drawing

Sorry, the representative drawing for patent document number 1149961 was not found.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date 1983-07-12
(22) Filed 1981-01-20
(45) Issued 1983-07-12
Expired 2000-07-12

Abandonment History

There is no abandonment history.

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee $0.00 1981-01-20
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
BLOOMSTEIN, RICHARD W.
Past Owners on Record
None
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Description 1994-01-14 9 413
Drawings 1994-01-14 2 38
Claims 1994-01-14 3 117
Abstract 1994-01-14 1 22
Cover Page 1994-01-14 1 13