Note : Les descriptions sont présentées dans la langue officielle dans laquelle elles ont été soumises.
W091/17170 PCT/US~1/02954
20~3593
.~ .
MULTIMERIC GELSOLIN FUSION CONSTRUCTS
:`
TECHNICAL FIELD OF INVENTION
This invention relates to multimeric and
hetero-multimeric gelsolin fusion constructs,
compositions containing them and methods using them.
More particularly, this invention relates to multimeric
gelsolin fusion constructs in which at le~st two
gelsolin fusion polypeptides are bound to vesicles
containing polyphosphoinositides. This invention also
relates to gelsolin fusion polypeptides which comprise
gelsolin moieties linked to functional moieties and, in
-~ particular, to CD4-gelsolin fusion polypeptides
. comprising an amino acid sequence for a human CD4
protein linked to a gelsolin moiety.
BACKGROUND ART
The rapid development of biotechnologies has
led to novel delivery and carrier systems for -
pharmaceuticals, vaccines, diagnostics an~ other
bioactive molecules. Optimally, these systems enhance
,. .
the properties of the molecules they carry, complement
- those molecules with characteristics they lack and
combine useful characteristics of different molecules.
Of particular interest to researchers are the serum
half-life of bioactive molecules, their affinity for
target particles and cells, targetability of bioactive
.", ,,
.~ :
~ WO9i/1717~) PCT/US91/029~4
2~63~93
;
- 2 -
molecules, bioactivity, immunogenicity and the ability
to administer or deliver several molecules
simultaneously. Scientists are seeking to identify new
molecules, including proteins, that they can
advantageously develop into these systems.
Gelsolin is a protein found in mammals and
other vertebrates [H.L. Yin and T.P. Stossel, "Control
of Cytoplasmic Actin Gel-sol Transformation by
Gelsolin, a Calcium-dependent Regulatory Protein",
Nature, 281, pp. 583-86 (1979); F.S. Southwick and M.J.
DiNubile, "Rabbit Alveolar Macrophages Contain a Ca2~-
sensitive, 41,000-dalton Protein Which Reversibly
Blocks the 'Barbed' Ends of Actin Filaments but Does
not Sever ~hem", J. Biol. Chem., 261, pp. 14191-95
(1986); T. Ankenballer et al., "Proteins Regulating
Actin Assembly in Oogenesis and Early Embryogenesis of
Xenopus laevis: Gelsolin Is the Major Cytoplasmic
Actin-binding Protein", ~. Cell Biol., 107, pp. 1489-98
(1988); H.L. Yin et al., "Identification of Gelsolin, a
Ca2+-dependent Regulatory Protein of Actin Gel-sol
;~ Transformation and Its Intracellular Distribution in a
Variety of Cells and Tissues", J. Cell. Biol., 91,
pp. 901-06 (1980); C.W. Dieffenbach et al., "Cloning of
Murine Gelsolin and Its Regulation During
Differentiation", J. Biol. Chem., 264, pp. 13281-88
(1989)]. In mammals, gelsolin occurs in two forms -- a
cytoplasmic form and a serum form. Gelsolin regulates
the activity of actin, a major protein involved in cell
structure and movement. Actin is a globular protein
with a slightly elongated shape that can polymerize
into filaments. Polvmerization occurs when the
"barbed" end of one actin monomer binds non-covalently
and reversibly to the "pointed" end of another. Inside
most cells, monomers and short filaments exist in a
fluid-like "sol" state until the monomers are activated
to polymerize into filaments and the filaments, in
. . . - , : , ~
. : , ~ . .
- . .. . .
.,
'....... ~ '.' ~ ,
: ' .: . ' ' '
WO91/17170 PCT/US91/02954
2063d93
turn, are activated to crosslink, producing a firmer
"gel" phase that forms part of the cellular
cytoskeleton. Investigators have observed that in the
presence of calcium ion, gelsolin prevents the
5 transition of monomers and filaments from gel phase to
sol phase.
Gelsolin acts on actin in three ways. First,
it severs the noncovalent bonds between the actin
monomers that compose actin filaments ("severing").
lO Second, it binds to the barbed end of actin filaments
and prevents elongation of the filament from that end
("capping"). Third, it binds to actin monomers and
promotes the formation of actin filaments by providing
a nucleus for polymerization ("nucleation"). The
15 result is a steady state which favors short actin
filaments unable to support the gel phase [P.A. Janmey
et al., "Interactions of Gelsolin and Gelsolin-actin
Complexes with Actin. Effects of Calcium on Actin
Nucleation, Filament Severing, and End Blocking",
Biochemistrv, 24, pp. 3714-23 (1985)].
Gelsolin's actin-severing function is
S stoichiometric: one gelsolin molecule binds to two
monomers on the actin filament, breaks the filament,
and remains bound to both monomers. The binding of
25 gelsolin to one of the monomers is Ca++ dependent, and
chelating agents such as EGTA cause dissociation of
gelsolin from only one monomer.
Scientists have identified two phosphatidyl
inositol phosphate phospholipids that bind to and
30 regulate the function of gelsolin. They are
phosphatidylinositol 4-monophosphate (PIP) and
phosphatidylinositol 4,5-biphosphate (PIP2) [P.A. Janmey
et al., "Polyphosphoinositide Micelles and
Polyphosphoinositide-containing Vesicles Dissociate
35 Endogenous Gelsolin-actin Complexes and Promote Actin
Assembly from the Fast-growing End of Actin Filaments
.
WO 91/17170 PCr/US91/02~4
`-2063593
- 4 ~
Blocked by Gelsolin", J. Biol. Chem., 262, pp. 12228-36
(1987), P.A. Janmey and T.P. Stossel, "~odulation of
Gelsolin Function by Phosphatidylinositol 4,5-
; biphosphate", Nature, 325, pp. 362-6~ (1987) and P.A.
5 Janmey and T.P. Stossel, "Gelsolin-phosphoinositide
Interaction", J. Biol. Chem., 264, pp. 4825-31 (1989)].
These polyphosphoinositides are minor membrane
phospholipids that play a role in signal transduction
in cells [B. Alberts et al., Molecular Bioloqy of the
10 Cell, Second Edition, Garland Publishing, Inc., New
York, New York, pp. 702-703 (1989)]. ~ogether they
comprise less than 10% of the total phospholipids of
cell membranes, and PIP2 comprises less than 1%. These
two molecules inhibit gelsolin activity by binding to
15 gelsolin and displacing the actin monomers that are
- bound to it in a nc~n-Ca++ dependent manner.
In extensively sonicated aqueous suspensions,
both PIP and PIP2 form vesicles. PIP2 forms small
vesicles, also called micelles, of about 80 nm in
20 diameter, that contain about one-hundred PIP2
'~! molecules. Each PIP2 micelle binds about eight
`- gelsolin molecules. PIP forms larger unilamellar (one-
layered) vesicles. Aggregation of PIP2 into large
unilamellar or multimellar vesicles in the presence of
25 millimolar concentrations of Mg or nonionic detergents
decreases the ability of PIP2 to inhibit the actin
filament-severing function of gelsolin. Incorporation
of PIP into mixed vesicles composed of phosphatidyl
choline (PC) also decreases this ability, although
30 about a third of maximal activity persists, even in
vesicles containing a very high ratio of PC to PIP2.
Mixed lipid vesicles whose composition approximates
that of the cell membrane (less than 3% PIP2) also
inhibit gelsolin activity. Several other
35 polyphosphoinostides which may be constructed, or have
: ;
,.:
'-
"'' ' ' . ' ~"' `'
Wosl/1717(~ PCT/US91/029~
--` 2~63~93
.-- ,
already been identified in nature, would also be
expected to bind gelsolin.
The cDNA for human plasma ~elsolin encodes a
protein of 755 amino acids plus a 27 amino acid signal
sequence [Kwiatkowski et al., "Plasma and Cytoplasmic
: Gelsolins Are Encoded by a Single Gene and Contain a
Duplicated Actin-bindiny Domain", Nature, 323, pp. 455-
58 (1986)]. This cDNA sequence accounts for both the
- plasma and serum forms of gelsolin, which are the
result of alternative transcriptional initiation sites
and message processing from a single gene, 70 kb long
[D. Kwiatkowski et al., "Genomic Organization and
Biosynthesis of Secreted and Cytoplasmic Forms of
Celsolin", J. Cell Biol., 106, pp. 375-84 (1988)]. The
difference between the plasma and cytoplasmic forms is
a 25 amino-acid residue extension on plasma gelsolin.
This appears to account for the difference in relative
molecular weight between the proteins as assessed by
SDS-polyacrylamide gel electrophoresis (SDS-PAGE),
93 kD and 90 kD, respectively.
Investigators have identified several
functional domains of gelsolin [H.L. Yin et al.,
"Identification of a Polyphosphoinositide-modulated
Domain in Gelsolin Which Binds to the Sides of Actin
Filaments", J. Cell Biol., 106, pp. 805-12 (1988) and
D. Kwiatkowski et al., "Identification of Critical
Functional and Regulatory Domains in Gelsolin", J. Cell
Biol., 108, pp. 1717-26 (1989)]. The gelsolin cDNA
contains a strong tandem repeat that divides the
molecule into two roughly equal halves. These
structural halves correspond to two functional halves:
The amino-terminal half of the protein contains a Ca++-
insensitive actin-severing function and the carboxy-
terminal half has a Ca++-sensitive actin binding domain.
; 35 Within these two tandem repeats are six domains of
weaker homology. The polypeptide has three actin
, .,
::
, ., ' , .. : - , . - - ' ~. ,. :,, . ,, ~ :
~ WO91/17170 PCT/US91/029~4
: 2~1~3~3
6 --
binding sites. Two monomer binding sites are located
between residues 26-139 and 407-756 (probably 661-738)
and an actin filament binding site is located between
residues 151-406. Amino acid residues 732-738 are
potentially important for Ca+~ regulation. Residues
660-73& are important for nucleation. This function
probably requires actin binding sites on both halves of
the molecule. The severing function resides in
residues 1-160, possibly between residues 139-160, with
critical dependence on the sequence 150-160 (the first
eleven residues of domain two). The PIP2-regulation of
gelsolin's severing activity apparently resides within
the first 160 residues. Sequences in domains 2 and 3
appear to hide a cryptic Ca++-sensitive domain because
when they are removed, the severing function of
gelsolin becomes Ca++ dependent.
Significantly, the amino acid sequence of
gelsolin exhibits homology with several other actin
binding proteins. It is forty-five percent homologous
with villin, found in vertebrate brush border
microvilli, which also has a Ca++-dependent actin
severing function. It is thirty-three percent
homologous with severin and fragmin ~P. Matsudaira and
P. Janmey, "Pieces in the Actin-severing Protein
Puzzle", Cell, 54, pp. 139-40 (1988)]. These
polypeptides also bind PIP and PIP2.
Despite advances in biotechnology, the need
still exists for methods and products which optimize
the characteristics and delivery of pharmaceuticals,
vaccines, diagnostics and bioactive molecules --
including polyvalency, affinity for a single target
particle, serum half-life, bioactivity and, in some
cases, immunogenicity. Furthermore, systems in which
;- the component parts may be easily varied would be
; . ~. .. . -
:
W091/l7~70 PCT/US91/02954
'' '' ' .21),~35g3
-- 7
especially useful because they would allow one to test
for species with optimal characteristics.
SUMMARY OF THE INVENTION
This invention solves these problems by
providing multimeric and hetero-multimeric gelsolin
fusion constructs. A multimeric gelsolin fusion
construct is a vesicle comprising at least one
polyphosphoinositide, such as PIP or PIP2 to which
gelsolin fusion polypeptides are bound. Gelsolin
- 10 fusion polypeptides comprise gelsolin moieties linked
to functional moieties which may be pharmaceutical
agents, vaccine agents, diagnostic agents or other
; bioactive molecules. Hetero-multimeric gelsolin fusion
constructs comprise at least two different functional
moieties or gelsolin moieties.
Gelsolin is a particularly attractive -
candidate for attachment to lipid vesicles because it
binds specifically and with great affinity to
polyphosphoinositides. Other proteins, related to
gelsolin, which also specifically bind
- polyphosphoinositides may also be employed. Some
examples are villin, fragmin, severin, profilin,
cofilin, Cap42(a), gCap39, CapZ and destrin.
Lipocortin (annexin) and DNaseI are other molecules
that bind polyphosphoinositides. Proteins that
specifically bind other lipids may also be used, as
well as proteins that bind lipids non-specifically.
The fusion constructs of this invention
advantageously utilize the ability of
polyphosphoinositide vesicles to bind multiple copies
of gelsolin fusion polypeptides. Consequently, in
contrast to monomeric molecules, the bioactive
molecules linked to them as functional moieties are
characterized by one or more of the following: -~
polyvalency, increased serum half-life, affinity for
.: ' .
- . . , , ., ~ ~
WO91/17170 pcT/~s9l/o29s~
206359~
.
-- 8 --
target particles or cells, greater bioactivity or
immunogenicity, and targetability.
The present invention also provides gelsolin
fusion polypeptides. Gelsolin fusion polypeptides
comprise gelsolin moieties fused or chemically coupled
to a functional moiety. In particular, this invention
provides CD4-gelsolin fusion polypeptides.
The lipid composition of a vesicle may also
be varied to permit the production of vesicles varying
in fluidity, size, the number of gelsolin molecules
that will bind to it and the rate of degradation in the
blood stream.
Depending upon the choice of functional
moiety, multimeric and hetero-multimeric gelsolin
fusion constructs are characterized by many uses.
Recognition molecules, such as those containing the
antigen binding site of antibodies, viral receptors or
cell receptors, are useful as functional moieties to
target fusion proteins to particular antigens. When
targeted in this manner, multimeric gelsolin fusion
constructs are useful to block the binding of viruses
to cells that results in infection, or the binding of
- cells to other cells that, for example, characterizes
pathologic inflammation. Due to the multivalency of
the fusion constructs of this invention, we believe
that they possess greater affinity for the target than
monovalent molecules. In one embodiment of this
invention, the functional moiety is the receptor on
human lymphocytes, CD4, which is the target of the HIV
virus -- the causative agent of AIDS and ARC.
- When hetero-multimeric fusion constructs
comprise gelsolin fusion polypeptides having
combinations of recognition molecules and toxins,
anti-retroviral agents or radionuclides, they are
useful as therapeutic agents which search out and
destroy their target.
:
,
.
WO91/17170 PCT/US91/02954
- 20~3~3
Multimeric gelsolin fusion constructs With
recognition molecules are als~ useful for signal
enhancement in diagnostic assays. As large, multimeric
molecules, they present many binding sites for reporter
molecules, such as horseradish peroxidase-conjugated
antibodies. Alternatively, they may take the form of
hetero-multimeric constructs, possessing both
recognition molecules and multiple reporter groups.
- When the functional molety is one or more
immunogen from one or more infectious agent, the fusion
proteins of this invention are useful in vaccines.
Also, multimeric gelsolin fusion constructs
may be employed as agents with increased bioactivity
when the functional group is an enzyme, substrate, or
inhibitor.
This invention also provides multimeric
gelsolin fusion constructs that are liposomes whose
constituents include polyphosphoinositides and that
contain bioactive agents in their interiors.
This invention further provides DNA sequences
~- that encode gelsolin fusion polypeptides, recombinant
DNA molecules comprising them and unicellular host
cells transformed with them. And this invention
provides methods for producing these fusion
polypeptides by culturing such hosts.
This invention also provides compositions
comprising any of the above-identified fusion
polypeptides or proteins that are useful as
therapeutic, prophylactic or diagnostic agents.
Multimeric CD4-gelsolin fusion constructs may be used
in diagnosing, preventing and treating AIDS, ARC or HIV
infection.
,~, ~' '
.,' -,
.... .
-. : .. . .. .. . . .
.
WO9l/17l7D PCT/US91/02954
2~35~3
-- 10 --
BRIEF DESCRIPTION OF THE DRAWINGS
Figures lA-lF ("Figure 1") (SEQ ID NO:l)
depict the DNA sequence and deduced amino acid sequence
of human gelsolin as set forth in D.J. Kwiatkowski
et al., Nature, 323, pp. 455-58 (1986). The negatively
; numbered amino acids correspond to the signal sequence,
which is absent from the mature polypeptide.
Throughout this specification, references to human
gelsolin by amino acid sequence or DNA sequence
correspond to the coordinate system set forth in this
figure.
Figure 2 depicts the functional regions of
human gelsolin amino acid sequence.
Figures 3A-3D ("Figure 3") (SEQ ID NO:2)
depict the DNA sequence and deduced amino acid sequence
- of human CD4 DNA. Nucleotides 1-75 are derived from
plasmid pl70.2. Nucleotides 76-741 are derived from
plasmid pCD4-gelsolin. Nucleotides 742 to 1377 are
derived from pl70.2. Throughout this specification,
references to CD4 by amino acid or DNA sequence
correspond to the coordinate system of this figure,
unless otherwise specified.
-- Figure 4 depicts the domain structure of
human CD4 protein. The numbered amino acids are
cysteine residues involved in disulfide bonding
accarding to Figure 3.
: Figure 5 depicts the DNA sequences of the
oligomers used in the processes set forth in the
examples of this application. The gelsolin sequences
: 30 in this figure are derived from SEQ ID NO:1. ACE 144
is SEQ ID NO:3. ACE 145 is SEQ ID NO:4. T4 AID-133 is
SEQ ID NO:5. T4AID-134 is SEQ ID NO:6. T4AID-137 is
SEQ ID NO:7. T4AID-176 is SEQ ID NO:8. T4AID-176 is
SEQ ID NO:g.
Figure 6 depicts the construction of plasmid
pCD4-gelsolin.
:
.
~ ~ .
: ' ~ .
"; ,....
WO 91/17170 PCr/US91/02954
.~ 2063~3
-- 11 -- :,
Figures 7A-7B ("Figure 7") (SEQ ID NO:10)
depicts the DNA sequence and deduced amino acid
sequence of pCD4-gelsolin.
Figure 8 is a restriction map of
5 pCD4-gelsolin.
Figure 9 depicts the construction of plasmid
pDC219.
Figures 10A-lOF ("Figure 10") ~SEQ ID NO:11)
depict the DNA sequence of p218-8.
Figure 11 depicts the construction of plasmid
P~PL180CYS-
Figures 12A-12I ("Figure 12") (SEQ ID NO:12)
depict the DNA sequence of pBG391.
-Figures 13A-13H ("Figure 13") (SEQ ID NO:13)
15depict the DNA sequence of pEX46.
DETAILED DESCRIPTION OF THE INVENTION
"Human plasma gelsolin" refers to a
. polypeptide having the amino acid sequence depicted in
j Figure 1 (SEQ ID NO:1) from amino acids -27 to +755.
20 It should be understood that polypeptide expression
often involves post-translational modifications such as
cleavage of the signal sequence, intramolecular
disulfide bonding, glycosylation and the like. The use
of the term, human plasma gelsolin, contemplates such
25 modifications to the amino acid sequence of Figure
(SEQ ID NO:1). The term also includes gelsolin
obtained from natural, recombinant or synthetic
sources.
"Multimeric gelsolin fusion constructs" and
30 "hetero-multimeric gelsolin fusion constructs" each
comprise gelsolin fusion polypeptides bound to a
vesicle of aggregated phospholipids. A "gelsolin
fusion polypeptide" comprises a gelsolin moiety bound
to a functional moiety. "Functional moieties" may be
35 polypeptides ("polypeptide moieties") or chemical
WO9l/17170 P~r/US91/02954
~æ~3~93
- - 12 -
compounds ("chemical moieties"). Throughout this
` application, specific gelsolin fusion polypeptides are
referred to by the name of the functional moiety. For
example, we call a gelsolin fusion polypeptide having
CD4 as the functional moiety, CD4-gelsolin fusion
polypeptide. Hetero-multimeric gelsolin fusion
constructs comprise at least two different functional
moieties or gelsolin moieties.
When the functional moiety is a polypeptide,
gelsolin fusion polypeptides may be produced by
chemical crosslinking or genetic fusion. Genetic
fusion involves creating a hybrid DNA sequence in which
; the DNA sequence encoding the polypeptide is fused to
the 5' end or 3' end of a DNA sequence encoding the
gelsolin moiety. Upon expression in an appropriate
host, this hybrid DNA sequence produces a gelsolin
fusion polypeptide in which the polypeptide moiety is
fused to the N-terminus or C-terminus of the gelsolin
moiety.
A "gelsolin moiety" as used herein is
gelsolin or a fragment thereof that specifically binds
to a polyphosphoinositide. Preferably, the gelsolin
moiety will be derived from human plasma gelsolin. A
gelsolin moiety preferably includes amino acids +150 to
25 +160 of Figure l (SEQ ID N0:1). As demonstrated
herein, the polypeptide containing amino acids +150 to
; +169 of Figure l (SEQ ID NO:l) has the ability to bind
PIP2. We believe that gelsolin derived from non-human
vertebrates may also be useful according to this
invention. The structure of gelsolin is highly
conserved in evolution and gelsolin from non-human
mammals may not be immunogenic in humans.
Lipid binding proteins ("LBPs'1) other than
gelsolin are also known to exist. These proteins, or
fragments of them that bind to particular lipids, are
useful as LBP moieties (similarly to gelsolin moieties)
'~ :
WO91/17170 P~T/~IS91/02n'4
~63593
- 13 -
to produce LBP fusion polypeptides that bind to
vesicles containing the particular lipid. This creates
multimeric or hetero-multimeric LBP fusion constructs.
Gelsolin-related proteins that speci~ically bind
polyphosphoinositides include villin, severin, fragmin,
profilin, cofilin, Cap42(a), gCap39, CapZ and destrin
[E. Andrè et al., "Severin, Gelsolin, and Villin Share
a Homologous Sequence in Regions Presumed to Contain
F-actin Severing Domains", J. Biol. Chem., 263,
10 pp. 722-27 (1988); W.L. Bazari et al., "Villin sequence
and Peptide Map Identify Six Homologous Domains", Proc. ~ -
Natl. Acad. Sci. USA, 85, pp. ~986-90 (1988); C. Ampe
et al., "The Primary Structure of Human Platelet
Profilin: Reinvestigation of the Calf Spleen Sequence",
15 FEBS Letters, 228, pp. 17-21 (1988); D.J. Kwiatkowski
and G.A.P. Bruns, "Human Profilin", J. Biol. Chem.,
` 263, pp. 5910-15 (1988); I. Lassing and U. Lindberg,
"Specificity of the Interaction Between
Phosphatidylinositol 4,5-biphosphate and the Profilin:
20 Actin Complex", J. Cell. Bioehem., 37, pp. 255-67 -
(1988); C. Ampe and J. Vandekerckhove, "The F-aetin
Capping Proteins of Physarum polYeephalum", EMBO. J.,
6, pp. 4149-57 (1987); I. Lassing and U. Lindberg,
"Speeifie Interaction between Phosphatidylinositol 4,5-
25 biphosphate and Profilaetin", Nature, 314, pp. 472-74
(1985), F.-X. Yu et al., "gCap39, a Calcium Ion- and
Polyphosphoinositide-regulated Actin Capping Protein",
Seienee, 250, pp. 1413-15 (1990); and N. Yonezawa
et al., "Inhibition of the Interactions of Cofilin,
Destrin and Deoxyribonuclease I with Actin by
Phosphoinositides", J. Biol. Chem., 265, pp. 8382-86
(1990)]. Other LBPs that specifically bind
polyphosphoinositides are lipocortin [K. Machoczek
et al., "Lipocortin I and Lipocortin II Inhibit
Phosphoinositide and Polyphosphoinositide-specific
Phospholipase C" FEBS Letters, 251, pp. 207-12 (1989)]
.
"
: . . ~ - ~ , . . .
. .
. , .
W091/17~70 PC~/~S91/0295q
2 0 6 3 ~ 9 3 - 14 -
and DNase I [J.A. cooper et al., ~The Role of Actin
Polymerization in Cell Motility", Ann. Rev. Phys., 53,
pp. 585-605 (1991)]. Protein kinase c is also an LBP
which binds to some phospholipids.
DNA sequences encoding gelsolin moieties are
derived from DNA sequences encoding gelsolin. Several
methods are available to obtain these DNA sequences.
i First, one can chemically synthesize the gelsolin gene
- or a degenerate version of it using a commercially
available chemical synthesizer. Figure 1 (SEQ ID N0:1)
sets forth a DNA sequence for gelsolin. The coding
- region encompasses nucleotides +1 to +2360.
Second, one can isolate a cDNA sequence
encoding gelsolin by screening a cDNA library. Many
screening methods are known to the art. For example,
colonies may be screened by nucleic acid hybridization
with oligonucleotide probes. Probes may be prepared by
chemically synthesizin~ an oligonucleotide having part
of the known DNA sequence of gelsolin. Alternatively,
cDNA libraries may be constructed in expression
;` vectors, such as ~gtll, and the colonies screened with
~`~ anti-gelsolin antibodies.
Third, one can isolate a cDNA encoding
gelsolin or a gelsolin moiety by amplifying DNA with
~' 25 polymerase chain reaction (PCR). We describe this
process in Example I.
The DNA sequence encoding the gelsolin moiety
may then be fused to a DNA sequence encoding the
polypeptide moiety. DNA sequences for the polypeptide
moieties useful in this invention are available from
many sources. These include DNA sequences described in
the literature and DNA sequences for particular
- polypeptides obtained by any of the conventional
molecular cloning techniques.
A wide array of polypeptides are useful to
produce the gelsolin fusion polypeptides of this
-' , ~
,, ,
- :
''
.
WO91/17170 P~T/US91/02954
2~63593
- 15 -
invention. Those most useful include polypeptides that
are advantageously administered in multimeric form.
For example, viral receptors, cell receptors or cell
ligands are useful because they typically bind to
particles or cells exhibiting many copies of the
receptor. Fusion constructs containing these fusion
polypeptides are useful in therapies that involve the
inhibition of viral-cell or cell-cell binding. Useful
viral-cell receptors include ICAM1, a rhinovirus
receptor; the polio virus receptor [J. M. White and
D.R. Littman, "Viral Receptors of the Immunoglobulin
Superfamily", Cell, 56, pp. 725-28 (1989)] and, most
preferably, CD4, the HIV receptor. Cell-cell receptors
or ligands include members of the vascular cell
15 adhesion molecule family, such as ICAM1, ELAM1, VCAM1 ;
and VCAMlb and their lymphocyte counterparts (ligands)
LFA1, CDX and VLA4. These molecules are involved in
pathologic inflammation [M.P. Bevilacqua et al.,
"Identification of an Inducible Endothelial-Leukocyte
Adhesion Molecule", Proc. Natl. Acad. Sci., USA, 84,
' pp. 9238-42 (1987); L. Osborn et al., "Direct
.l Expression Cloning of Vascular Cell Adhesion
Molecule 1: A Cytokine-induced Endothelial Protein
that Binds to Lymphocytes", Cell, 59, pp. 1203-11
(1989); C.A. Hession et al., "Endothelial Cell-
leukocyte Adhesion Molecules (ELAMs) and Molecules
Involved in Leukocyte Adhesion (MILAs)", WO 90/13300].
Other lymphocyte associated antigens, such as LFA2
(CD2) and LFA3 (both members of the CDll/CD18 family)
and PAGEM are also useful.
Bacterial immunogens, parasitic immunogens
and viral immunogens may be used as polypeptide
moieties to produce multimeric or hetero-multimeric
gelsolin fusion constructs useful as vaccines.
- 35 Bacterial sources of these immunogens include those
~ responsible for bacterial pneumonia and pneumocystis
i ~ .
: . . ~: . . , , , ~ . ,
.
WO91/17170 PC~/US91/02~54
2~63'~93
- 16 -
pneumonia. Parasitic sources include malarial
-parasites, such as Plasmodium. Viral sources include
poxviruses, e.g., cowpox virus and orf virus; herpes
viruses, e.g., herpes simplex virus type l and 2,
B-virus, varicella-zoster virus, cytomegalovirus, and
Epstein-Barr virus; adenoviruses, e.g., mastadenovirus;
papovaviruses, e.g., papillomaviruses, and
polyomaviruses such as BX and JC virus; parvoviruses,
e.g., adeno-associated virus; reoviruses, e.g.,
reoviruses l, 2 and 3; orbiviruses, e.g., Colorado tick
fever; rotaviruses, e.g., human rotaviruses;
alphaviruses, e.g., Eastern encephalitis virus and
Venezuelan encephalitis virus; rubiviruses, e.g.,
rubella; flaviviruses, e.g., yellow fever virus, Dengue
fever viruses, Japanese encephalitis virus, Tick-borne
encephalitis virus and hepatitis C virus;
coronaviruses, e.g., human coronaviruses;
paramyxoviruses, e.g., parainfluenza l, 2, 3 and 4 and
~mumps; morbilliviruses, e.g., measles virus;
;20 pneumovirus, e.g., respiratory syncytial virus;
vesiculoviruses, e.g., vesicular stomatitis virus;
lyssaviruses, e.g., rabies virus; orthomyxoviruses,
e.g., influenza A and B; bunyaviruses e.g., LaCrosse
virus; phleborviruses, e.g., Rift Valley fever virus;
nairoviruses, e.g., Congo hemorrhagic fever virus;
hepadnaviridae, e.g., hepatitis B; arenaviruses, e.g.,
-lcm virus, Lassa virus and Junin virus; retroviruses,
e.g., HTLV I, HTLV II, HIV I and HIV II; enteroviruses,
e.g., polio virus l, 2 and 3, coxsackie viruses,
echoviruses, human enteroviruses, hepatitis A virus,
hepatitis E virus, and Norwalk virus; rhinoviruses
e.g., human rhinovirus; and filoviridae, e.g., Marburg
(disease) virus and Ebola virus.
Immunoglobulins or fragment thereof that bind
to a target molecule may also be employed as functional
moieties. Immunoglobulin molecules are bivalent, but
.
. :: . :,
~ .. . : . . .
-
.
;. : -
WOgl/17]70 PCT/US91/029~4
` 2~63~93
- 17 -
multimeric immunoglobulin-gelsolin fusion constructs,
which are multivalent, may demonstrate increased
affinity or avidity for the target. Investigators have
also made use of single domain antibodies (dAbs) [E.S.
Ward et al., "Binding Activities of a Repertoire of
Single Immunoglobulin Variable Domains Secreted from
Escherichia coli", Nature, 341, pp. 544-46 (1989)].
One can generate monoclonal Fab fragments recognizing
specific antigens using the technique of Huse et al.
and use individual domains as functional moieties in
multimeric or hetero-multimeric gelsolin fusion
constructs according to this invention [W.D. Huse
et al., "Generation of a Large Combinatorial Library of
the Immunoglobulin Repertoire in Phage Lambda",
Science, 246, pp. 1275-81 (1989)]. See also A. Skerra
,~ and A. Pluckthun "Assembly of a Functional
Immunoglobulin Fv Fragment in Escherichia coli",
Science, 240, pp. 103~-43 (1988)].
According to this invention, multimeric
gelsolin fusion constructs may be produced in which the
functional moiety is an enzyme, enzyme substrate or
enzyme inhibitor. We believe that such agents will
exhibit greater bioactivity than monomeric molecules
because multimers have a higher density of the moiety
and will exhibit increased catalytic rate. For
example, we believe that a multimeric gelsolin fusion
construct with tissue plasminogen activator would have
greater clot-dissolving catalytic activity than its
monovalent counterpart. Similarly, we believe that a
multimeric gelsolin fusion construct with hirudin would
demonstrate greater anti-coagulant activity than
hirudin alone.
Other useful functional moieties include, but
are not limited to, polypeptides such as cytokines,
including the various IFN-~'s, particularly ~2, ~5, ~8,
IFN-B and IFN-~, the various interleukins, including
'"
, ~ . ,,, . ... ~ .... . .. .
:' . . ~ .
., ,. - ....
- - - .. , - ; ~ :
,. : : :
. . .
,
~ -, . , ~ ~ - , .
WO9l/17170 PCT/US91/02954
., :
2~ ,59~)
- 18 -
IL-1, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7 and IL-8 and
the tumor necrosis factors, TNF-~, and B. In addition,
functional moieties include monocyte colony stimulating
factor (M-CSF), granulocyte colony stimulating factor
(G-CSF), granulocyte macrophage colony stimulating
- factor (GM-CSF), erythropoietin, platelet-derived
growth factor (PDGF), and human and animal hormones,
including growth hormones and insulin.
According to one embodiment of this
invention, multimeric gelsolin fusion constructs
comprise CD4-gelsolin fusion polypeptides. CD4 is the
;~ receptor on those white blood cells, T-lymphocytes,
which recognizes HIV, the causative agent of ~IDS and
ARC [P.J. Maddon et al., "The T4 Gene Encodes the AIDS
Virus Receptor and Is Expressed in the Immune System
and the Brain", Cell, 47, pp. 333-48 (1986)].
Specifically, CD4 recognizes the HIV viral surface
protein, gpl20 and gpl60. In CD4-gelsolin fusion
polypeptides the functional moiety is a polypeptide
moiety comprising full length CD4 or a fragment
thereof, preferably soluble CD4. Use of the term, CD4,
in this specification may refer to full length CD4 or
fragments of CD4, unless specified.
A DNA sequence encoding full length human CD4
polypeptide and its deduced amino acid sequence is set
forth in Figure 3 (SEQ ID N0:2). (See also P.J. Maddon
et al., "The Isolation and Nucleotide Sequence of a
cDNA Encoding the T Cell Surface Protein T4: A New
Member of the Immunoglobulin Gene Family", Cell, 42,
pp. 93-104 (1985).) Based upon its deduced primary
structure, the CD4 polypeptide is divided into
functional domains as follows:
,
.' . , ' .. ~ ' '' ~ ~ ' .
;, : , ,
, . .
, ~ ~. ... .
WO91/17170 PCT/US91/02954
~ 20~3~3
Amino Acid
Coordinates
Structure/Pro~osed Location In Fiaure 3
Hydrophobic/Secretory Signal -25 to -1
5 First Immunoglobulin-related +1 to +107
domain/Extracellular
Second Immunoglobulin-related +108 to +177
domain/~xtracellular
Third Immunoglobulin-related +178 to +293
10 domain/Extracellular
Fourth Immunoglobulin-related
domain/Extracellular +294 to +370
Hydrophobic/Transmembrane +371 to +391
Sequence
15 Very Hydrophilic/ +392 to +431
Intracytoplasmic
The first immunoglobulin-related domain can be further
resolved into a variable-related (V) region and joint-
related (J) region, beginning at about amino acid +95
[S.J. Clark et al., "Peptide and Nucleotide Sequences
; of Rat CD4 (W3/25) Antigen: Evidence for Derivation
from a Structure with Four Immunoglobulin-related
Domains", Proc. Natl. Acad. Sci., USA, 84, pp. 1649-53
(1987)].
These domains also correspond roughly to
structural domains of the CD4 protein due to intra-
domain disulfide bonding. Thus, disulfide bonds ~oin
amino acids at positions +16 and +84 in the first
immunoglobulin-related domain, amino acids +130 and
+159 of the second immunoglobulin-related domain, and
amino acids +303 and +345 of the fourth immunoglobulin-
related domain. Figure 4 depicts the domain structure
of the full length human CD4 protein.
Soluble CD4 proteins have been constructed by
truncating the full length CD4 protein at amino acid
+375, to eliminate the transmembrane and cytoplasmic
~'
WO91~17170 PCT/US91/02'i:~4
2 0 ~ 3 ~ 9 3
- 20 -
domains. Such proteins have been produced by
recombinant DNA techniques and are referred to as
recombinant soluble CD4 (rsCD4) [R.A. Fisher et al.,
"HIV Infection Is ~locked In Vitro by Recombinant
Soluble CD4", Nature, 331, pp. 76-78 (1988); Fisher
et al., PCT patent application W0 89/01940
(incorporated herein by reference)]. These soluble CD4
proteins advantageously interfere with the CD4+
lymphocyte/HIV interaction by blocking or competitive
binding mechanisms which inhibit HIV infection of cells
expressing the CD4 protein. The first immunoglobulin-
related domain is sufficient to bind gpl20 and gpl60.
By acting as soluble virus receptors, soluble CD4
- proteins are useful as antiviral therapeutics to
inhibit HIV binding to CD4+ lymphocytes and virally
induced syncytia formation.
,The CD4 polypeptides useful in this invention
- include all CD4 polypeptides which bind to or otherwise
inhibit gpl20 and gpl60. These include fragments of
CD4 lacking the transmembrane domain, amino acids +371
to +391 of Figure 3 (SEQ ID N0:2). Such fragments may
be truncated forms of CD4 or be fusion proteins in
which the fourth immunoglobulin-related domain is
joined directly to the hydrophilic cytoplasmic domain.
We shall refer herein to a CD4 polypeptide which
includes amino acids +l to +X of Figure 3 (SEQ ID
N0:2), and optionally including an N-terminal
methionine or f-methionine, as "CD4(X)". When a CD4
polypeptide is engineered to include a carboxy-terminal
-30 cysteine, we shall refer to the polypeptide as
"CD4(XCys)".
For example, referring now to Figure 3 (SEQ
ID N0:2), a soluble CD4 protein contalning the first
- immunoglobulin-like domain preferably will contain at
least amino acids +l to +84 and at most amino acids +l
to +129. Most preferably, a soluble CD4 protein
.
':
.,. , , : ,
WO9~/17170 PCT/US91/02954
1 2~6~)9~
- 21 -
comprises amino acids +1 to +111 [CD4(111)~. A soluble
CD4 protein containing the first two immunoglobulin-
like domains preferably will include at least amino
acids +l to +159 and at most amino acids +1 to +302.
More preferably, a soluble CD4 protein will include at
least aminD acids +1 to +175 and at most amino acids +1
to +lso. Most preferably, a soluble CD4 protein will
include amino acids +1 to +181 [CD4 (181)], amino acids
+1 to +183 [CD4 (183)], or amino acids +1 to +187
10 [CD4 (187)]. A soluble CD4 protein which includes the
first four immunoglobulin-like domains preferably will
include at least amino acids +1 to ~345 [CD4(345)] and
at most amino acids +1 to +375 [CD4(375)]. Any of
these molecules may optionally include the CD4 signal
sequence, amino acids -23 to -1 of Figure 3 (SEQ ID
N0:2). Also, these molecules may have a modified
methionine residue preceding amino acid, +1.
Soluble CD4 proteins useful in the fusion
polypeptides and methods of this invention may be
produced in a variety of ways. According to the
coordinate system in Figure 3 (SEQ ID N0:2), the amino
terminal amino acid of mature CD4 protein isolated from
T cells is lysine, encoded at nucleotides 136 to 139 of
Figure 3 (SEQ ID N0:2). [D.R. Littman et al.,
"Corrected CD4 Sequence", Cell, 55, p. 541 (1988).]
;~ Soluble CD4 proteins also include those in which amino
acid +1 is asparagine, +62 is arginine and +229 is
phenylalanine. Therefore, when we refer to CD4, we
intend to include amino acid sequences including any or
all of these substitutions. Soluble CD4 polypeptides
may be produced by conventional recombinant techniques
involving oligonucleotide-directed mutagenesis and
restriction digestion, followed by insertion of
linkers, or by digesting full-length CD4 protein with
enzymes.
. - : -,
. -: . , .
',''',, '' ' . ', ,. ' ,," .. ',,"' '~" ' '": ., ., - ' :',', . "~' ' ' '., ' ' " '' '" . ' . " '
- : - " .. : , . ... :: : : . . : ,
WO91/17170 PCT/~'S~l/02~5~
.
2 0 ~ 3 ~ .J 3
- 22 -
Soluble CD4 proteins include those produced
by recombinant techniques according to the processes
set forth in copending, commonly assigned United States
patent applications Serial No. 094,322, filed
September 4, 1987 and Serial No. 141,649, filed
January 7, 1988, and PCT patent application Serial
No. PCT/US88/02940, filed September 1, 1988, and
published as PCT patent application W0 89/01940, the
disclosures of which are hereby incorporated by
reference.
Microorganisms and recombinant DNA molecules
characterlzed by DNA se~uences coding for soluble CD4
proteins are exemplified by cultures described in PCT
patent application W0 89/01940. They were deposited in
the In Vitro International, Inc. culture collection, in
Linthicum, Maryland, USA on September 2, 1987 and
identified as:
EC100: E.coli JM83/pEC100 - IVI 10146
BG377: E.coli MC1061/pBG377 - IVI 10147
BG380: E.coli MC1061/pBG380 - IVI 10148
BG381: E.coli MC1061/pBG381 - IVI 10149.
Such microorganisms and recom~lnant DNA molecules are
also exemplified by cultures deposited in the In Vitro
International, Inc. culture collection on January 6,
1988 and identified as:
BG-391: E.coli MC1061/pBG391 - IVI 10151
BG-392: E.coli MC1061/pBG392 - IVI 10152
i BG-393: E.coli MC1061/pBG393 - IVI 10153
BG-394: E.coli MC1061/pBG394 - IVI 10154
30 BG-396: E.coli MC1061/pBG396 - IVI 10155
203-5 : E.coli SG936/p203-5 - IVI 10156.
Additionally, such microorganisms and
recombinant DNA molecules are exemplified by cultures
; deposited in the In Vitro International, Inc. culture
collection on August 24, 1988 and identified as:
. ~ .. .. .
~.... . .
, ,.: . , ; .:: : .:
: - - . ,: . , . :
WO91/17170 PCT/~'S91/029~4
~.20~3~g~
- 23 -
211-11: E.coli A89/pBG211-11 - IVI 10183
214-10: E.coli A89/pBG21~-10 - IVI 10184
215-7 : E.coli A89/pBG215-7 - IVI 10185.
Multimeric CD4-gelsolin fusion constructs
comprising CD4-gelsolin fusion polypeptides may be used
;~ in pharmaceutical compositions and methods to treat
humans having AIDS, ARC, HIV infection, or antibodies
to HIV. Accordingly, they may be used to lessen the
immuno-compromising effects of HIV infection or to
prevent the incidence and spread of HIV infection. In
addition, these fusion proteins and methods may be used
for treating AIDS-like diseases caused by retroviruses,
such as simian immunodeficiency viruses, in mammals,
including humans.
DNA sequences encoding gelsolin fusion
polypeptides are useful for producing multimeric
gelsolin fusion constructs. The preferred process for
using these DNA sequences involves expressing the
gelsolin fusion polypeptide in an appropriate host,
!'; 20 isolating the polypeptide, and binding it to a vesicle
comprising a polyphosphoinositide.
;~ As is well known in the art, for expression
of the DNA sequences of this invention, the DNA
sequence should be operatively linked to an expression
control sequence in an appropriate expression vector
and employed in that expression vector to transform an
appropriate unicellular host. Such operative linking
of a DNA sequence of this invention to an expression
control sequence, of course, includes the provision of
:; 30 a translational start signal in the correct reading
frame upstream of the DNA sequence. If a particular
. DNA sequence being expressed does not begin with an
AT~, the start signal will result in an additional
amino acid -- methionine (or f-methionine in
3S bacteria) -- being located at the N-terminus of the
,', ' ' ' . .
: .
.
WO91~17170 ~CT/US91/02954
, .
2~63~9~
- 24 -
product. While such methionyl-containing product may
be employed directly in the compositions and methods of
this invention, it is usually more desirable to remove
the methionine before use. Methods are known to those
of skill in the art to remove such N-terminal
methionines from polypeptides expressed with them. For
example, certain hosts and fermentation conditions
permit removal of substantially all of the N-terminal
methionine in vlvo. Expression in other hosts requires
in vitro removal of the N-terminal methionine.
However, such ln vivo and in vitro methods are well
known in the art.
A wide variety of host/expression vector
combinations may be employed in expressing the DNA
sequences of this invention. Useful expression
vectors, for example, may consist of segments of
chromosomal, non-chromosomal and synthetic DNA
sequences, such as various Xnown derivatives of SV40
and known bacterial plasmids, e.g., plasmids from
E.coli including colEl, pCRl, pBR322, pMB9 and their
derivatives, wider host range plasmids, e.g., RP4,
phage DNAs, e.g., the numerous derivatives of phage ~,
e.g., NM989, and other DNA phages, e.g., Ml3 and
filamentous single stranded DNA phages, yeast plasmids,
such as the 2~ plasmid or derivatives thereof, and
vectors derived from combinations of plasmids and phage
DNAs, such as plasmids which have been modified to
employ phage DNA or other expression control sequences.
In addition, any of a wide variety of
expression control sequences -- sequences that control
the expression of a DNA sequence when operatively
linked to it -- may be used in these vectors to express
the DNA sequences of this invention. Such useful
expression control sequences, include, for example, the
early and late promoters of SV40 or adenovirus, the lac
system, the trp system, the TAC or TRC system, the
.. : .
WO91/17170 PCT/~S91/~2954
, j ,2,~,~3~3
. ' :. .
- 25 -
; major operator and promoter regions of phage ~, the
control regions of fd coat protein, the promoter for 3-
phosphoglycerate kinase or other glycolytic enzymes,
the promoters of acid phosphatase, e.g., Pho5, the
promoters of the yeast ~-mating factors, the polyhedron
promoter of the baculovirus system and other sequences
known to control the expression of genes of prokaryotic
or eukaryotic cells or their viruses, and various
combinations thereof.
A wide variety of unicellular host cells are
also useful in expressing the DNA sequences of this
invention. These hosts include well known eukaryotic
and prokaryotic hosts, such as strains of E.coli,
Pseudomonas, Bacillus, Stre~tomvces, fungi, such as
yeasts, and animal cells, such as CHO and mouse cells,
African green monkey cells, such as COS-l, COS-7,
BSC l, BSC 40, and BMT l0, insect cells, and human
` cells and plant cells in tissue culture. For animal
cell expression, we prefer CHO cells and COS-7 cells.
It should of course be understood that not
all vectors and expression control sequences will
function equally well to express the DNA sequences of
this invention. Neither will all hosts function
equally well with the same expression system. However,
one of skill in the art may make a selection among
these vectors, expression control sequences, and hosts
without undue experimentation and without departing
from the scope of this invention. For example, in
~- selecting a vector, the host must be considered because
the vector must replicate in it. The vector's copy
number, the ability to control that copy number, and
the expression of any other proteins encoded by the
vector, such as antibiotic markers, should also be
considered.
In selecting an expression control sequence,
: a variety of factors should also be considered. These
.. . , . ~ ..
:
'
.' ' ' . , ' ' : .
, ~ ' ' . ',' '' ~ '
'
.
.
WO 9i/17170 Pcr/Usg~/1)29~4
20635~3
- 26 -
include, for example, the relative strength of the
system, its controllability, and its compatibility with
the particular DNA sequence of this invention,
particularly as regards potential secondary structures.
Unicellular hosts should be selected by consideration
of their compatibility with the chosen vector, the
toxicity of the product coded for on expression by the
DNA sequences of this invention to them, their
secretion characteristics, their ability to fold
proteins correctly, their fermentation requirements,
and the ease of purification of the products coded on
expression by the DNA sequences of this invention.
Within these parameters, one of skill in the
art may select various vector/expression control
system/host combinations that will express the DNA
sequences of this invention on fermentation or in large
scale animal culture, e.g., CHO cells or COS-7 cells.
According to one embodiment of this
invention, a plasmid comprising a DNA sequence encoding
a CD4-gelsolin fusion polypeptide operatively linked to
a ~PL promoter expression control sequence is expressed
- in E.coli to produce a CD4-gelsolin fusion polypeptide.
The polypeptides and proteins produced on
expression of the DNA sequences of this invention may
be isolated from fermentation or animal cell cultures
and purified using any of a variety of conventional
methods. One of skill in the art may select the most
appropriate isolation and purification techniques
without departing from the scope of this invention.
One can also produce gelsolin fusion
polypeptides by chemical synthesis using conventional
peptide synthesis techniques, such as solid phase
synthesis [R.B. Merrifield, "Solid Phase Peptide
Synthesis. I. The Synthesis of a Tetrapeptide", J. Am.
- 35 Chem. Soc., 83, pp. 2149-54 (1963)].
-'-'
.. . ... . ... . ..
:: . ' '' .. ' . . . .: ' ~ : : ,
Wosl/17170 PCT/US91/029~4
- 20~3593
- 27 -
Another method useful for producing gelsolin
fusion polypeptides, in addition to genetic fusion and
chemical synthesis, is by chemically coupling the
functional moiety to the gelsolin moiety. This method
is useful for both chemical moieties and polypeptide
~ moieties.
; Several approaches are available for
chemically coupling the gelsolin moiety to a
polypeptide moiety. The preferable strategy is to
identify or create sites on the polypeptide moiety
through which it may be selectively linked to the
gelsolin moiety without destroying the activity of the
polypeptide moiety. Glycoproteins, such as CD4, have
limited numbers of sugars that are useful as cross-
- 15 linking sites. The sugars may be oxidized to aldehydes
and an aldehyde then reacted with an amine group on the
gelsolin moiety to create an aldehyde-amine linkage.
[P.K. Nakane and A. Kawaoi, "Peroxidase Labelled
Antibody: A New Method of Conjugation", J. Histochem.
- 20 Cytochem., 22, p. 1084 (1984) and T.-H. Liao et al.,
"Modification of Sialyl Residues of Sialolycoprotein(s)
of the Human Erythrocyte Surface", J. Biol. Chem., 248,
pp. 8247-53 (1973)]. CD4 has two functional
: glycosylation sites at amino acids +269 to +271 and
25 +298 to +300 (see SEQ ID N0:3). These are outside the
gpl20 binding region, which is within the first 113
amino acids of the protein [B.H. Chao et al., "A 113-
amino Acid Fragment of CD4 Produced in Escherichia coli
Blocks Human Immunodeficiency Virus-induced Cell
. 30 Fusion", J. Biol. Chem., 264, pp. 5812-17 (1989)].
Therefore, coupling CD4 through the carbohydrate should
not interfere with function. Alternatively, CD4 may be
genetically engineered to eliminate one of the
glycosylation sites. This would increase selectivity
; 35 during linkage. We describe aldehyde-amine linkages in
Example II using CD4.
, , - .
... . . . .
WO91/17~70 PCT/US91/02954
20~3~93
- 28 -
Protein chemists have also developed specific
chemistries for covalently coupling polypeptides
through thiol groups. A polypeptide moiety having a
free thiol may be linked to a gelsolin moiety
containing a cysteine either by direct formation of a
disulfide bond or indirectly through a homo-
bifunctional crosslinker. One example of a homo-
bifunctional crosslinker is bismaleimidohexane (BMH)
which has thiol-reactive maleimide groups and forms
covalent bonds with free thiols. These methods require
the construction of a gelsolin moiety with a cysteine
at the amino- or carboxy-terminus. Peptide
synthesizers (Example II, Section 2) are useful for in
these constructions.
- 15 If the polypeptide moiety does not have a
free thiol group, such a group may be introduced. For
example, the polypeptide may be bound to a thiol-
containing amine. More particularly, an oxidized sugar
on the polypeptide moiety may be reacted with the amine
as described above.
Also, a cysteine may be introduced into the
amino acid sequence of the polypeptide moiety by site-
directed mutagenesis.
' Alternatively, the polypeptide moiety and the
gelsolin moiety may be crosslinked through hetero-
bifunctional crosslinking agents. These are chosen so
that one of the functional groups binds to a group on
the polypeptide moiety and the other binds to the thiol
on the gelsolin moiety. For example, a succinimide
group could bind to an amine group on the polypeptide
moiety and a thiol-reactive group, such as a maleimide
or an activated thiol could bind to a cysteine on the
; gelsolin moiety.
We describe methods involving thiol linkage
in Example III using CD4. The Pierce Co.
Immunotechnology Catalogue and Handbook Volume l
:'
-, . , - . . -: :. . : . .
, , . ` .: : : : . . ' . . :
, : ' ' :: ~ .: . . : -;
. - , : ~ : . ... .
wosl/l7l70 PCT/~S91/029~4
~0~3~93
29
E4-E12, E41-E48 and E31-E40 describes many useful
homo- and hetero-bifunctional crosslinkers, thiol-
containing amines and molecules with reactive groups.
Other methods useful for colpling both
polypeptide and chemical moieties include, for example,
those employing glutaraldehyde [M. Reichlin, "Use of
Glutaraldehyde as a Coupling Agent for Proteins and
Peptides", Methods In EnzYmoloqY, 70, pp. 159-65
(1980)], N-ethyl-N' (3-dimethylaminopropyl)-
carbodiimide [T.L. Goodfriend et al., "Antibodies toBradykinin and Angiotensin: A Use of Carbodiimides in
Immunology", Science, 144, pp. 1344-46 (1964)] or a
mixture of N-ethyl-N'-(3-dimethylaminopropyl)-
carbodiimide and a succinylated carrier [M.H. Klapper
and I.M. Klotz, "Acylation with Dicarboxylic Acid
Anhydrides", Methods In EnzYmolo~Y, 25, pp. 531-36
(1972)]. Since chemical coupling is not limited to one
site on the gelsolin moiety, it is possible to couple
more than one functional moiety to each gelsolin
20 moiety. -
Multimeric and hetero-multimeric gelsolin
fusion constructs according to this invention may be
produced by binding gelsolin fusion polypeptides to
phospholipids aggregated into a vesicle. The vesicle -~
must comprise at least one phospholipid that binds to
gelsolin, but may contain others as well. The
; phosphatidylinositols, PIP and PIP2, are preferable
components of the vesicle because they bind to
gelsolin. To be effective the vesicles preferably
- 30 contain at least 3% of PIP or PIP2. Other lipids that
may comprise the vesicle include, but are not limited
to, phosphatidylcholine (PC), phosphatidyl ethanolamine
(PE), phosphatidylserine (PS). One may also create
vesicles containing detergents such as Triton.
The production of phospholipid vesicles is
well known to the art [D.M. Haverstick and M. Glaser,
:
.: ~
- ~ . . : .
. . . ~ .
. ~ ~
WO91/17170 PCTt~lS91/02954
206~3~93
- 30 -
"Visualization of Ca2t-induced Phospholipid Domains",
Proc. Natl. Acad. Sci., USA, _, pp. 4475-79 (1987)].
- For example, dried lipids are mixed with water and the
mixture is sonicated, producing vesicles. PIP should
be sonicated more thoroughly than PIP2 in order to
obtain vesicles of similar size and binding. The
gelsolin fusion polypeptide is then added and allowed
to bind to the vesicles. The resulting product is a
- multimeric gelsolin fusion construct.
The fact that a vesicle may comprise many
different lipids and detergents allows great
flexibility in engineering a fusion construct with
; desired characteristics. For example, one may produce
vesicles that bind different numbers of gelsolin fusion
polypeptides by varying the lipid composition of the
starting materials to create larger vesicles, or by
increasing the percentage of PIP or PIP2 in the vesicle.
Also, one may alter the half-life of the functional
moiety. We expect that these vesicles will be subject
to eYentual degradation by lipases. By altering the
lipid composition of the vesicle, one could vary the
degradation rate of the vesicle.
When phospholipid vesicles containing
cavities are prepared in the presence of a bioactive
molecule, such as those illustrated herein, that
molecule will come to be enclosed within the vesicles.
; Accordingly, it is possible to produce a multimeric
gelsolin fusion construct that encloses within it a ;
bioactive agent. These liposomes may fuse with cell
membranes, delivering their contents to cells and
adding the gelsolin fusion polypeptide to the cell
membrane.
Hetero-multimeric gelsolin fusion constructs
comprise at least two different functional moieties or
two different gelsolin moieties. For example, hetero-
multimeric gelsolin fusion constructs may comprise two
, .
W091/17l70 PCT/~IS91/02954
20~3593
- 31 -
different polypeptide moieties, two different chemical
moieties or both a polypeptide moiety and a chemical
moiety.
Hetero-multimeric gelsolin fusion constructs
are especially useful when the properties of the
different moieties complement one another. For
example, it is possible to combine receptors that bind
to a particular target particle or cell with toxins or
anti-retroviral agents in fusion proteins according to
this invention to produce targeted toxic or anti-
retroviral agents. Polypeptides useful as toxins
include, but are not limited to, ricin, abrin,
angiogenin, Pseudomonas Exotoxin A, pokeweed antiviral
protein, saponin, gelonin and diphtheria toxin, or
toxic portions thereof. Useful anti-retroviral agents
; include suramin, azidothymidine ~AZT), dideoxycytidine
and glucosidase inhibitors such as castanospermine,
deoxynojirimycin and derivatives thereof.
Hetero-multimeric gelsolin fusion constructs
according to this invention are also useful as
:~ diagnostic agents to identify the presence of a target
molecule in a sample or ln vlvo. Such proteins -
comprise one functional moiety which is a recognition
; molecule, such as an immunoglobulin or a fragment
thereof (Fab, dAb) that binds to the target molecule
[See Ward et al., supra] and a second functional
moiety, which is a reporter group, such as a
radionuclide, an enzyme (such as horseradish
peroxidase) or a fluorescent or chemiluminescent
marker. Typically, the reporter group will be bound
directly to the reporter group; for example, HRP is
bound directly to the immunoglobulin. Many reporter
groups may be coupled to a multimeric gelsolin fusion
constructs thereby enhancing the signal. These
constructs may be used, for example, to replace
antibodies as the recognition molecules that contact
''
., . ~ . .
.. . .
~ WO91/17170 PCT/US91/02954
2~ 32 -
- the sample in ELISA-type assays, or as in vlvo imaging
agents.
Hetero-multimeric gelsolin fusion constructs
according to this invention may also be used as multi-
vaccines. For example, one may produce such constructsusing several different antigenic determinants from the
same infective agent. Also, one can produce constructs
comprising antigenic determinants from several
infectivç agents, such as polio, measles, mumps and
others used for childhood vaccination.
, The pharmaceutical compositions of this
invention typically comprise a pharmaceutically
effective amount of a multimeric gelsolin fusion
construct and a pharmaceutically acceptable carrier.
Therapeutic methods of this invention comprise the step
of treating patients in a pharmaceutically acceptable
; manner with those compositions. These compositions may
be used to treat any mammal, including humans.
The pharmaceutical compositions of this
invention may be in a variety of forms. These include,
for example, solid, semi-solid and liquid dosage forms,
such as tablets, pills, powders, liquid solutions or
suspensions, liposomes, suppositories, injectable and
infusible solutions and sustained release forms. The
preferred form depends on the intended mode of
administration and therapeutic application. The
compositions also preferably include conventional
pharmaceutically acceptable carriers and adjuvants
which are known to those of skill in the art.
Generally, the pharmaceutical compositions of
the present invention may be formulated and
administered using methods and compositions similar to
those used for pharmaceutically important polypeptides
such as, for example, alpha interferon~ The fusion
constructs of this invention may be administered by
conventional routes of administration, such as
.. - .
WO91/17~70 PCT/US91/02954
2~3~3
i ~ ! ' , . , ~
33
parenteral, subcutaneous, intravenous, intramuscular or
; intralesional routes. It will be understood that
conventional doses will vary depending upon the
particular molecular moiety involved.
In order that this invention may be better
understood, the following examples are set forth.
These examples are for the purposes of illustration
only, and are not to be construed as limiting the scope
of the invention in any manner.
In the examples that follow, the molecular
biology techniques employed, such as cloning, cutting
with restriction enzymes, isolating DNA fragments,
` filling out with Klenow enzyme and deoxyribonucleotides
triphosphate (dXTP), ligating, transforming E.coli and
the like are conventional protocols exemplified and
further described in J. Sambrook et al., Molecular
'; Cloning, A Laboratory Manual, Cold Spring Harbor
Laboratory Press, Cold Spring Harbor, New York (1989).
EXAMPLE I - PRODUCTION OF A CD4-GELSOLIN FUSION
POLYPEPTIDE BY GENETIC FUSION
1. Cloninq of pCD4-Gelsolin
We constructed a plasmid expression vector
containing a DNA sequence encoding a CD4-gelsolin
fusion polypeptide and used it to transform E.coli.
- 25 The coding region contains a DNA sequence for CD4(181)
fused to the 5' end of 140 bp fragment encoding a 12
amino-acid spacer and amino acids 150-173 of gelsolin.
This includes the PIP binding domain. We constructed
the plasmid as follows. (See Figure 6.)
First, we produced a DNA sequence containing
the human gelsolin PIP2 binding domain. The PIP2
binding domain is encompassed within amino acids +150
to +169 (nucleotides 541-600) of Figure 1 (SEQ ID
NO:1). We created this DNA sequence from the plasmid
pMlD which contains the cDNA human gelsolin-encoding
. .
:
., ; :: -
-
WO91/17~7~ PCT/US91/02954
2063~93 34
sequence of Figure 1 (SEQ ID N0:1). (Plasmid pMlD wasthe gift of David Kwiatkowski, Harvard Medical School,
Boston, Massachusetts.) We amplified a cDNA sequence
for the PIP2 binding domain using polymerase chain
reaction (PCR) (Sambrook et al., Chapter 14). We
carried out all amplifications using Taq DNA polymerase
and primers prephosphorylated with T4 polynucleotide
kinase and ATP. We used the oligonucleotide ~CE 144
(SEQ ID NO:3) as the sense primer (which hybridizes to
` 10 the anti-sense strand) and ACE 145 (SEQ ID NO:~) as the
anti-sense primer. (See Figure 5.) We filled out the
amplified fragments with Klenow enzyme and dXTP. This
produced blunt-ended 140 bp DNA fragments having a
BalII site near the 5' end and an EcoRI site near the
3' end. The fragments encoded gelsolin amino acids
+143 through +173 (see SEQ ID N0:1).
Then we digested an intermediate plasmid,
pNN03, with EcoRV and dephosphorylated the ends to
prevent recircularization. Plasmid pNN03 is derived
from pUC13 by the incorporation of a polylinker.
(Pharmacia PL Biochemicals). We subcloned the 140 bp
fragments into this plasmid. We called the resulting
plasmid pGell.
We then inserted the BalII/EcoRI DNA fragment
- 25 encoding the gelsolin PIP2 bindlng domain from pGell
; into a prokaryotic expression vector containing a DNA
sequence encoding CD4(181) and derived from pEX56.
Plasmid pEX56 encodes C~4(181) fused in-
frame to the 5' end of a DNA insert encoding
- 30 Pseudomonas endotoxin. The insert is bordered by EcoRI
sites at the 5' and 3' ends and contains a ~qlII site
: at the ~unction of the CD4-endotoxin sequence. The
Pseudomonas endotoxin gene has been altered to remove
the ribosome binding region. Plasmid pEX56 is created
by site-directed mutagenesis of pEX46 (Example III,
section 2 and Figure 13 (SEQ ID N0:13) with -
..
- . : , . . ,: , .: . ,
WO 91/17170 PCT/US91/029~4
2063593
- 35 -
oligonucleotide T4-AID 176 (Figure 5, SEQ ID NO:9).
[The plasmid is described in co-pending PCT application -
PCT/US89/04584, incorporated herein by reference.]
We digested a first sample of pEX56 with
EcoRI and BalII and isolated the 613 bp fragment that
encodes CD4(181). Then we digested a second sample of
pEX56 with EcoRI, dephosphorylated the fragments, and
isolated the 3922 bp fragment representing the pEX56
vector portion. We ligated together the 3922 bp EcoRI
fragment, the 613 bp EcoRI/BqlII fragment and the 140
bp BqlII/EcoRI fragment. We used this ligation mixture
to transform E.coli JA221 ATCC 33875] by standard
CaCl2 procedures. (See Sambrook, Chapter 1.82.) We
identified the plasmids pCD4-gelsolin and p~CD4-
gelsolin (opposite orientation and therefore non-
expressing) by restriction digests of mini-plasmid DNA
preparations. The plasmid map of pCD4-gelsolin is
shown in Figure 8. The DNA sequence and predicted
amino acid sequence of the CD4-gelsolin fusion
polypeptide obtained is shown in Figure 7 (SEQ ID
No:10~. We have deposited an isolate of pCD4-gelsolin
with In Vitro International, IVI-10253.
.. :
2. Expression of CD4-Gelsolin
We transformed E.coli JA221 and E.coli A89
(an ht~R- protease deficient mutant) with pCD4-gelsolin
and p~CD4-gelsolin. E.coli A89 is a tetracycline-
sensitive mutant of E.coli SG936 [ATCC 39624]. We then
tested the cultures for the production of CD4-gelsolin.
Our results showed that pCD4-gelsolin, but not paCD4-
gelsolin, produced a polypeptide of the molecularweight predicted for CD4-gelsolin.
; We grew 5 ml overnight cultures in LB + 12.5
~g/ml tetracycline at 30C. We diluted the overnight
cultures l:10 into LB + 12.5 ~g/ml tetracycline and
grew the cultures until the optical density at 550 nm
,"
:'
. .
.~ ' ' , ~ ,
WO91/17170 PCT/US91/02954
2 0 ~ 3 r 9 3
- 36 -
was between l and l.5. We then added the culture to an
equal volume of LB + 12.~ ~g/ml tetracycline at 42C.
After two hours we harvested the cells, lysed them, and
analyzed the contents for a protein ~and corresponding
to the size expected for a CD4-gelsolin fusion molecule
by SDS-polyacrylamide gel electrophoresis (SDS-PAGE).
We thus identified a protein having molecular weight of
about 28 kD.
` We have isolated this protein using the
protocol of Example III, section 2b.
EXAMPLE II CHEMICAL CROSS-LINKING OF A GELSOLIN
MOIETY TO CD4 VIA ALDEHYDE-AMIDE LINKAGE
We cross-linked CD4(375) (a gift of Biogen,
Inc., Cambridge, Massachusetts) to a gelsolin moiety by
oxidizing sugars on the CD4 glycoprotein to aldehydes
and then reacting an aldehyde with an amine on the
gelsolin moiety to create an aldehyde-amine linkage.
l. Oxidation of CD4(375)
We dialyzed l00 ~M CD4~375) protein against
0.l M sodium acetate pH 5.0 at 4C. We incubated the
preparation at 23C for l hour with 1 mM aqueous sodium
periodate and immediately desalted on a P6DG column
(BioRad, Richmond, Californla) that was equilibrated in
10 mM sodium acetate pH 5.0, l00 mM NaCl. We stored
25 the oxidized CD4(375) at 4C for subsequent use or at
;~ -70C for long term storage. We monitored the extent
of oxidation by measuring incorporation of tritiated
sodium borohydride. Typically 8-l0 aldehydes per
-~ CD4(375) were generated.
To confirm that oxidation did not interfere
with the CD4(375) function, we assessed the ability of
the modified protein to bind gpl20 in an ELISA format.
We coated IMMULON II~ plates (Dynatech Laboratories,
Chantilly, Virginia3, with gpl20 (a gift of Biogen,
:
:.
.. ..
- : :. . .
wo91/1717n PCTt~S91/02954
: 20~3~3
- 37 -
Inc., and commercially available from American
Bio-Technologies, Inc., Cambridge, Massachusetts),
added CD4(375) or oxidized CD4(375), and then
determined the binding with a reporter system using
OKT4 antibody (available from Ortho Diagnostics
Systems, Raritan, New Jersey) that was conjugated with
horseradish peroxidase. There was no difference in
binding of soluble CD4 protein or oxidized CD4 to
gpl20. Upon amino acid analysis, both samples were
also found to be similar with no apparent effect of
oxidation on individual amino acids.
2. Reaction of Oxidized CD4(375)
with the Gelsolin Moiety GEL1
We synthesized a gelsolin moiety, GEL1, using
an Applied Biosystems 430A peptide synthesizer. GELl
has the amino acid sequence Gly-Tyr-Gly-Lys-His-Val-
Val-Pro-Asn-Glu-Val-Val-Val-Gln-Arg-Leu-Phe-Gln-Val-
Lys-Gly-Arg-Arg (SEQ ID NO:14). The final twenty amino
acids constitute the PIP2-binding sequence of gelsolin,
20 amino acids +150 to +169 (see SEQ ID NO:1). To
crosslink GEL1 with CD4(375), we incubated varying
-; concentrations of GEL1 overnight at 23C with 10 ~M
oxidized CD4(375) in the presence of 50 mM MES, pH 6.5,
and 5 mM sodium cyanoborohydride.
We tested the sample for crosslinking by SDS-
PAGE. Samples were either analyzed directly by
staining with Coomassie brilliant blue or by Western
blotting using an antiserum raised in rabbits against
GEL1. The immunogen consisted of GEL1 crosslinked to
Keyhole limpet hemocyanin with glutaraldehyde.
We found a dose dependent increase in the
molecular weight of CD4 treated with GELl, indicating
- that the protein had become modified. At low peptide
concentrations, there was little effect on the mobility
: 35 of CD4~375) but when incubated with 1 mM GELl,
,
::, . ~ ,; , . . . ~
,....
WO~1/17170 PCT/US91/02954
~ . .
20~359~ - 38 -
approximately 50% of the CD4(375) migrated with an
increased apparent molecular weight that is consistent
with it containing one GELl peptide per CD4(375). When
CD4(375) was incubated with lO mM GELl, all of the
protein shifted to a high-molecular weight form. We
observed a series of bands that likely correspond to
moieties with one, two, and three gelsolin moieties per
CD4(375). The need for a large molar excess of GELl
- over CD4(375) to drive the crosslinking reaction is
consistent with the results obtained for modifying
periodate oxidized CD4 with other amino-containing
reagents as well. (See Example III.)
To verify that GELl had been crosslinked to
CD4(375), we analyzed selected fractions by Western
' 15 blotting using antibodies against GELl. A prominent
immunoreactive band was observed in the sample after
crosslinking. This band is absent from the Western
blot of an untreated CD4 sample.
3. Analysis of the CD4-Gelsolin
Fusion Polypeptide
... , .
We demonstrated above that the crosslinking
- chemistry did not affect the ability of CD4(375) to
bind gpl20. We have further established that CD4(375)-
gelsolin fusion polypeptides bind to PIP2 vesicles.
We assayed the ability of CD4(375)-gelsolin
; to associate with PIP or PIP2 vesicles using an
. aggregation assay similar to that described by Janmey
et al., "Phosphoinositide Micelles and
Polyphosphoinositide-containing Vesicles Dissociate
Endogenous Gelsolin-actin Complexes and Promote Actin
Assembly From the Fast-growing End of Actin Filaments
Blocked by Geloslin", J. Biol. Chem., 262, pp. 12228-
36 (l987). In the assay, the amount of protein used is
appropriately adjusted to take into account the
molecular weight of the CD4-gelsolin fusion
. . .
:,
. .
W O 91/17170 PCT/US91/02954
20S3~93
- 39 -
- polypeptide. Mg causes micelles of pure
polyphosphoinositides to aggregate into larger
vesicles, increasing the turbidity of the solution.
However, gelsolin inhibits this aggregation. We found
that CD4(375)-gelsolin behaved like the GELl peptide in
this assay. Recombinant sCD4, alone, had no activity
in this assay.
Because the junction between the gelsolin
peptide fragment and the spacer is unnatural, it may be
necessary to change the composition or length of the
spacer region in order to optimize function. This
involves resynthesizing the gelsolin peptide fragment
with other sequences added at either the amino or
carboxy terminus of the polypeptide. The coupling
chemistry would not be affected. Alternatively, it may
be advantageous to change selected amino acids from the
binding sequence in order to change the affinity of the
fusion polypeptide for PIP2.
EXAMPLE III STRATEGIES FOR CROSSLINKING CD4
THROUGH THIOL GROUPS
We describe here three strategies for
crosslinking the CD4 polypeptide moiety with a gelsolin
moiety through thiol groups. They involve the
modification of the CD4 protein to contain a cysteine,
a free thiol or a thiol-reactive group.
. . .
1. Introducina a Free Thiol into CD4
~` First, a thiol group may be introduced into
: CD4 using thiol-containing amines, such as cysteine,
cystamine or glutathione. An aldehyde is introduced
into CD4 and then one creates an aldehyde-amine linkage
(see Example II). Once the thiol-containing CD4 is
generated, it can be selectively crosslinked to the
gelsolin moiety.
:
, . ., , , ~ ,
.. : . .
- : - .
- , ,
" ' ' " - - ~ .~ ' ,
. .
,: ' ~' -
WOgl/17170 PCT/US91/029~4
20635~3
_ 40 _
we incubated periodate oxidized CD4(375)
- (0.5 mg/ml) overnight at 23C in 50 mM MES, pH 6.5,
- 5 mM sodium cyanoborohydride with 20 mM of either
cysteine, oxidized cystamine or oxidized glutathione to
create CD4(cysteine), CD4(cystamine), and
CD4(glutathione). We treated the samples with 40 mM
DTT for 40 minutes at 23C. We then dialyzed them
against storage buffer (10 mM sodium acetate, pH 5.0,
100 mM NaCl). We monitored the extent of modification
with Ellman's reagent. Briefly, we diluted the samples
into 100 ~il of 100 mM sodium phosphate pH 8.0, 0.5 mM
DTNB and measured the absorbance after 5 minutes at
410 nm. We calibrated the samples against a standard
curve that was developed with reduced glutathione.
Both cystamine and glutathione treatments resulted in
three to five groups per CD4. For subsequent studies,
the preparations were concentrated to 5 mg/ml using a
CENTRICON-100 filtration unit (Amicon, Danvers,
Massachusetts).
These molecules may be bound to gelsolin
moieties through the thiol groups using homo-
bifunctional crosslinking agents with two thiol-
reactive groups, such as BMH or o- or p-phenylene
dimaleimide. We believe that this method will result
in crosslinking because treatment of CD4(cystamine)
with these agents induced the formation of CD4 dimers
and higher molecular weight complexes. With sub-
stoichiometric amounts of crosslinker we were able to
drive crosslinking of CD4 to greater than 50~. A
similar strategy will be used with the cysteine-
containing gelsolin moiety where a dimaleimide agent
will be used to generate crosslinking complexes.
Alternatively, the moieties may be
crosslinked through disulfide bonds using conventional
techniques.
- .
' ~'
; . , . : : ,,
W091/17170 PCT/US9]/02954
: ;2~-63593
.
- 41 -
2. Introducing a Free Cysteine into
CD4 by Site-Specific Muta~enesis
Second, a free cysteine may be introduced in
the primary sequence of CD4 through genetic
engineering. Crosslinking to the gelsolin moiety is
then directed using the methods of section 1 of this
example. We describe herein the construction and
isolation of two truncated forms of CD4 engineered to
contain cysteine residues at their C-termini.
a. Construction of pDC219
i and Expression of CD4(lllCYs)
To produce CD4(lllCys) we constructed the
expression plasmid pDC219. (See Figure 9.) We began
with p218-8, a plasmid in which the ~PL promoter
controls the expression of CD4(111). This plasmid is
described in PCT patent application W0 89/0194,
p. 77/93, Figure 28. The DNA sequence for p218-8 is
depicted in Figure 10 (SEQ ID N0:11). We digested a
first sample of p218-8 with PstI and BqlII and isolated
;20 the 3645 bp fragment. We then digested a second sample
of p218-8 with PstI and EcoRI and isolated the 269 bp
fragment. We digested a third sample of p218-8 with
EcoRI and ~pMI and isolated the 395 bp fragment. We
, .
isolated these fragments by electrophoresing the
digests on agarose gels, cutting out the relevant bands
and electroeluting the DNA fragments. We precipitated
the electroeluted DNA fragments with ethanol,
-centrifuged the mixture to pellet the DNA fragments and
resuspended the fragments in 10 mM Tris-HCl, pH 8.0,
1 mM Na2EDTA.
We phosphorylated oligonucleotides T4AID-133
(SEQ ID N0:5) and T4AID-134 tSEQ ID N0:6) (Eigure 5)
using bacteriophage T4 polynucleotide kinase. These
oligonucleotides contain a BalII recognition sequence.
;, :
- ' , ~ '' ~ '',
' ,'' ,' ~
,
: . , , , :
: ~
WO91/17170 PCT/VS91/029~4
2U~3~
- 42 -
Then we ligated the purified D~A fragments and the
oligonucleotides.
We used the reaction mixture to transform
E.coli DHl. We selected colonies that grew at 30C,
; 5 12.5 ~g/ml tetracycline and analyzed them for the
correct sequences by digestion with BqlII. We
subjected those plasmid DNAs which had the additional
BqlII site to DNA sequence analysis. Thus we obtained
pDC219.
To produce CD4(lllCys), we transformed A89
cells with pDC219 and fermented the cells at a 10 liter
scale. (We achieved an expression level of 13%.) We
stored cells as frozen cell pellets.
To isolate CD4(lllCys) we thawed 50 g frozen
whole cells, suspended them in 20 mM Tris pH 7.5, 1 mM
EDTA, 0.4 mg/ml lysozyme, and mixed with a Polytron
- (Brinkman Instruments, Westbury, N.Y.). We stirred the
cell slurry at room temperature for one hour, then
- passed it three times through a prechilled Manton
Gaulin French press (550 setting). We chilled the
lysate on ice between each passage. We pelleted
particulates in a SA600 rotor for 15 minutes at 10,000
rpm. We washed the resulting pellet twice with a 1:4
dilution in 20 mM Tris pH 9.0 and pelleted it as
25 before. (All ratios given are whole cell weight to `
buffer volume.) We then washed the pellet with a 1:4
dilution in 20 mM Tris pH 9.0 containing 0.5 M NaCl and
- spun down the pellet using previous conditions by -
: resuspending with a Polytron. We extracted the final
pellet in a 1:4 dilution of extraction buffer (7 M
urea, 20 mM Tris 9.0, 10 mM ~-mercaptoethanol) and
stirring at room temperature for 15 minutes. We
removed debris by centrifugation in a SA600 rotor at
15,000 G for 30 minutes.
We diluted the clarified supernatant 1:4 with
fresh extraction buffer and passed it over a Fast S
: '
': ', ~ ' ~,. : ~ ',
: . - . . - . . , ,, .:: .
. - : : . .
,: . . ~ : ~
WO91/17170 PCTI~S91/02954
2~3593
- 43 -
cation exchange column (Pharmacia) pre-equilibrated
with extraction buffer at a column ratio of 1 gm whole
- cells to 4 ml resin. We washed the column extensively
with extraction buffer. We then eluted the protein
with salt steps of half column volume of extraction
buffer containing 0.05 M, 0.075 M, 0.1 M, 0.15 M and
0.2 M NaCl, respectively. CD4(111Cys) routinely eluted
in the 0.15 M NaCl step.
We pooled the CD4(lllCys) peak and diluted it
to an absorbance of under O.D. 0.5 at 280 nm. Then we
dialyzed the sample overnight, 1:100 V:V, with one
change, against 3 M urea, 20 mM Tris pH 7.5. We
diluted the dialysate to 1 M urea with the 20 mM Tris
pH 7.5, and filtered it through 0.45 ~ sterile filter
unit. We bound CD4 from the filtrate to 6C6-Sepharose
for one hour at 4C with rocking. 6C6 is a monoclonal
antibody developed at Biogen that recognizes CD4 and
blocks CD4 binding to gpl20. Alternatively, one may
use anti-Leu-3a, a monoclonal available from Becton-
Dickinson, Mountain View, California. Then we pouredthe slurry into a column and washed with 2 x 0.5 column
volumes 50 mM Tris pH 7.5, 0.5 M NaCl (wash 2), and 2 x
0.5 column volumes of wash 1 buffer (wash 3).
CD4(lllCys) was eluted from the resin with a series of
0.1 column volume additions of 50 mM glycine, pH 3.0,
250 mM NaCl. We neutralized the eluate by the addition
of 2 M Tris pH 9.0 to 50 mM.
The resulting affinity purified protein was
90% CD4(lllCys) monomer with contaminating multimeric
bands. When run under reducing conditions these
additional bands collapsed into the monomer, indicating
they were disulfide forms of the protein. From 1 gm
wet weight of cells we recovered between 0.5 to 0.75 mg
of CD4(lllCys). We assayed the gpl20 binding activity
and found it to be about half the specific activity
that is observed for full length CD4.
,' ' - : ."
,:, . . . : '~ '
:
WO9l/17170 PCT/US91/02954
2 ~ '~ 3 3 9 3
- 44 -
We carried out biotinylation studies using
maleimidobutyryl biocytin (MBB) to test the
susceptibility of the engineered cysteine to
modification with the maleimide. We monitored biotin
labeling on Western blots using avidin-conjugated HRP
to track the biotin. Specific biotin labeling of
CD4(111Cys) was observed when fresh samples were
analyzed; however, the efficiency of labeling decreased
~. with time as the samples aged.
: 10 b. Construction of ~P 180cys and
Expression of CD4 ~80CYs)
To produce CD4 (180Cys), we constructed the
expression plasmid ~PL180CYS, in which a ~PL promoter
controls the expression of a DNA sequence encoding
CD4(180CYS). (See Figure 11.)
We began with plasmid pBG391, an animal cell
expression vector that expresses CD4 (375). (The DNA
sequence of this plasmid is set forth in Figure 12 ( SEQ
; ID N0:12)). We cleaved pBG391 with StuI. StuI cuts
the CD4 gene at the codon for amino acid 182. We
phosphorylated oligonucleotides T4AID-137 (SEQ ID NO:7)
and T4AID-138 (SEQ ID NO:8) (Figure 5) and ligated into
the StuI-cleaved pBG391. This generated pBG398C2. We
identified pBG398C2 by the presence of a BamHI site,
generated at the junction of the StuI site and T4AID-
137.
; Then we cleaved pBG398C2 with SacI and BalII
; and isolated the 490 bp fragment. We cleaved PEX46
with SacI and BamHI and isolated the large fragment.
(The DNA sequence of PEX46 is set forth in Figure 13
(SEQ ID NO:13)). Then we ligated the two fragments
together. This generated plasmid ~PL180CYS.
In 10 liter fermentations, CD4(180CYS) was
expressed at about 5% of the total cell protein.
',
,'
:, . . , ... : .
, ~, . . .. . .
; .
~............ : . ' ~
W09l/17170 PCT/US91/029~4
- 45 _ ~ 63~ 3
We suspended fermentation cells at 8 ml/gm
cell wet weight in 20 mM Tris-HCl, 1 mM Na2EDTA, p~
7.7, broke th~m in two passes through a French press
and washed them twice with 20 ml/gm cell wet weight of
1 M guanidine-HCl, 1 M urea, 15 mM sodium acetate, pH 5
followed by two washes in 20 mM Tris-HCl, 1 mM Na2
EDTA, pH 7.7. we extracted the washed pellet with
25 ml/gm cell wet weight of 6 M guanidine-HCl, 20 mM
Tris-HCl, 10 mM DTT, pH 7.7 overnight at room
temperature. We spun the suspension for 45 minutes in
a SS-34 rotor at 20,000 rpm. We diluted the
supernatant 1:60 into cold 20 mM Tris-HCl, pH 7.7 and
added BSA to a final concentration of 0.5 mg/ml.
To generate microgram amounts of the protein,
we concentrated the diluted extract by ultrafiltration
using a PM10~ membrane (Amicon) followed by affinity
purification on 6C6-Sepharose 4B. Alternatively,
CD4(180Cys) may be prepared as follows: The pH of the
diluted extract obtained as described above is lowered
to 7.0 with HCl and loaded at 1% vol/vol onto a Fast S
column equilibrated in 20 mM Tris-HCl, pH 7Ø Bound
- protein is washed with 5 column volumes of
equilibration buffer and eluted with 0.2 M NaCl in the
same buffer. The elution pool is diluted with one
- 25 volume of 20 mM Tris-HCl, pH 7.7 and loaded on a
6C6-Sepharose 4B column. The bound protein is washed
and eluted from the affinity column in 50 mM glycine,
: 250 mM NaCl, pH 3Ø The elution fractions are
neutralized with 1/15 volume of 0.5 M HEPES pH 7.5,
pooled according to the A280 profile and stored at 4C.
One may bind CD4(lllcys) or CD4(180cys) to a
thiol-containing gelsolin moiety using the chemistries
described in section 1 of this example.
,
- , , ~ ~ ,~,,
. ~ .. .,. . . . . :.
., - : : ~ . .. . ..
,
WO91/17170 PCT/US91/02954
2~6~93
- 46 -
3. Hetero-bifunctional Crosslinkinq Aaents
According to a third method, CD4 may also be
crosslinked with a cysteine-containing gelsolin moiety
; using a hetero-bifunctional crosslinking agent. Such
crosslinkers include succinimidyl 4-(N-maleimidomethyl)
cyclohexane-l-carboxylate (SMCC), m-maleimidobenzoyl-
N-hydroxysuccinimide ester (MBS), or N-succinimidyl 3-
(2-pyridyldithiol) proprionate (SPDP). The
succinimidyl arms of these crosslinkers bind to primary
amines in CD4. The reactive thiol (maleimide) of SMCC
and MBS and the activated thiol of SPDP react with the
thiol from the cysteine in the gelsolin moiety to form
the covalent linkage.
To carry out the reaction with SMCC and MBS,
the crosslinker is incubated with CD4 for 0.5 hours at
pH 6.0 at 23C. Unreacted crosslinker is then removed
: on a desalting column. SPDP is used as described in
the Pharmacia Co. Users Manual. A gelsolin moiety
having a free terminal cysteine is then added. The
mixture is incubated for 3 hours at 23C, creating the
covalent linkage. Unreacted gelsolin moiety is removed
on a desalting column.
'! The extent and specificity of the
modification can be analyzed as described in
Example II. The lysine content of CD4 is high;
therefore reactions with lysine would not provide much
specificity. However, by limiting the amount of
crosslinker added, it may be possible to direct
crosslinking to one or a small number of lysines that
are particularly reactive.
Alternatively, one may bind the reactive
thiol group of the hetero-bifunctional crosslinker to a
thiol group introduced into CD4 and then bind the
succinimidyl arm to an amine in the gelsolin moiety.
,.
' ' , ' ' ,
"', ' .. , . '" ~ - ' .'~' , ., ',,.'. : ."
WO91/17170 PCT/US91/029~4
'
2~3~3
- 47 -
EXAMPLE IV - MULTIMERIC GELSOLIN EUSION CONSTRUCT
We have shown that CD4-gelsolin fusion
~ polypeptides retain affinity for gpl20 and that they
- bind PIP2 vesicles through the gelsolin moiety. This
demonstrates that the chemistry we have developed to
produce multimeric gelsolin fusion constructs is sound.
As a next step, we produced and tested a multimeric
CD4(375)-gelsolin fusion construct.
Multimeric gelsolin fusion constructs
comprising CD4-gelsolin fusion polypeptides were
produced using methods that involve binding the fusion
polypeptides to PIP2 vesicles.
PIP2 vesicles were produced in the following
manner. PIP2 may be obtained as a lyophilized solid
(Sigma Chemical Co., St. Louis, Missouri). Water was
added to the dried sample to a concentration of 1 to 3
mg/ml and the mixture was sonicated for between 30
seconds to 2 minutes at maximum intensity in a Heat
Systems - Ultrasonics, Inc. (Farmingdale, New York)
W185~ apparatus or its equivalent until an optically
-`- clear solution formed. These samples were kept at 4C
and used within a week or they were stored frozen for
; future use. For storage, the samples were divided into
aliquots, frozen in liquid nitrogen and stored at -70
until use. Prior to use, the samples were thawed
quickly under a stream of warm water and sonicated for
30 minutes at room temperature in a water bath
sonicator.
CD4-gelsolin fusion polypeptides were then
added to lipid at a 5 to 10 molar excess of lipid over
protein and the mixture was incubated at room
temperature for about five minutes.
We tested the ability of the multimeric
CD4(375)-gelsolin fusion construct to bind gpl20 in an
ELISA-type assay. Briefly, we coated plates with
.,
. - : :: . : :.
WO91/17170 PCT/US91/029~4
2063S93
- 48 -
gpl20, added the fusion construct and assayed for
binding using anti-CD4 as the reporter antibody. We
did not detect binding of the multimeric CD4(375)- -
gelsolin fusion construct to gpl20.
We also tested the biological activity of the
fusion construct in a viral r~plication assay similar -~
to the one described in co-pending United States
application 07/583,022 (incorporated herein by
reference). Briefly, we incubated the fusion construct
with HIV, added cells from a T-cell line, and measured
the incidence of infection. Multimeric CD4t375)-
gelsolin fusion construct did not block infection in
this assay. -
As a result, we found that rsCD4, itself,
binds to PIP2 vesicles and that in doing so, its
ability to bind gpl20 is inactivated. Recombinant sCD4
has pockets of positive charge that cause it to bind to
cation exchange matrices with high avidity at neutral
pH. Since PIP2 vesicles, like cation exchange
matrices, possess high negative charge, we believe that
the binding of rsCD4 to PIP2 vesicles is due to its
; ionic character.
Therefore, one may produce multimeric CD4-
gelsolin fusion constructs that bind gpl20 by altering
the charge of the CD4 moiety so that it no longer binds
PIP2 vesicles. The first one-hundred-thirteen amino
acids of rsCD4, which contain the gpl20 binding domain,
contain sixteen basic amino acid residues: thirteen
lysine residues and three arginine residues. Using
site specific mutagenesis, one may alter one or more of
' these into histidine, a basic, but less polar amino
acid, or into neutral amino acids. Among these
alternate versions of CD4, one may select molecules
. that bind gpl20 but do not bind PIP2 vesicles. We
. 35 believe that these alternate versions of CD4 would be
' '
,, .
- - . . - . . . ~ .
,
. .
WO91/17170 PCT/US91/02954
.
2063~3
- 49 -
useful to produce multimeric CD4-gelsolin fusion
constructs that possess gpl20 binding ability.
Although they do not bind gpl20, multimeric
CD4(181)-gelsolin fusion constructs have other uses.
For example, they are useful as immunogens to elicit ~-
CD4 antibodies. In diagnostic assays, they are useful
to detect the presence of ~-CD4 in a sample. A
percentage of patients infected with HIV exhibit ~-CD4
antibodies.
Positive charge at neutral pH and high salt
concentration is uncommon among proteins. Accordingly,
we do not believe that many proteins other than CD4
would exhibit deactivation when employed to produce
multimeric-gelsolin fusion constructs according to this
invention. Nevertheless, the ionic character and
i lipid-binding properties of potential functional
moieties are factors to be considered in predicting the
; ultimate biological activity and characteristics of
multimeric qelsolin fusion constructs produced using
them.
- Microorganisms and recombinant DNA molecules
according to this invention are exemplified by cultures
deposited in the In Vitro International, Inc. culture
collection, in Linthicum, Maryland, USA on May 4, l990,
; 25 and identified as:
pCD4-qelsolin IVI-10253
pl70.2 IVI-10252.
While we have hereinbefore described a number
of embodiments of this invention, it is apparent that
our basic embodiments can be altered to provide other
embodiments which utilize the processes and
compositions of this invention. Therefore, it will be
appreciated that the scope of this invention includes
all alternative embodiments and variations which are
defined in the foregoing specification and by the
, . . . . .... .
WO91/17170 PCT/USgl/02954
2063S93
- 50 -
claims appended hereto; and the invention is not to be
limited by the specific embodiments which have been
presented herein by way o~ example.
.' ; .
:' ,
,
,
~',.'' '
:, :
. .
;
:
r
: ':
~'
.
.
., :
,' ".
., ' .
: ,,,
. ,, ., . . . ., . . , , , , . ... .. ~ .
. .
~, ' ~., . ': ' , .' ,, " : ' ~, : .
, -
. . .
,
W O 91/17170 P~/US91/029~4
20635~3
- 51 -
SEQUENCE LISTING
(1) GENERAL INFORMATION:
(i) APPLICANT: PEPINSRY, R. BLAKE
ROSA, MARGARET D.
STOSSEL, THOMAS P.
(ii) TITLE OF INVENTION: MULTIMERIC GELSOLIN FUSION CONSTRUC~S
(iii) NUMBER OF SEQUENCES: 14
(iv) CORRESPONDENCE ADDRESS:
(A) ADDRESSEE: FISH ~ NEAVE
(8) STREET: 875 Third Avenue
(C) CITY: New York
(D) STATE: New York
(E) COUNTRY: United S.ates of America
(F) ZIP: 10022
(v) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk
(B) COMPUTER: IBM PC compatible
(C) OPERATING SYSTEM: PC-DOSIMS-DOS
(D) SOFTWARE: PatentIn Release #1.0, Version #1.25
,:
(vi) CURRENT APPLICATION DATA:
(A) APPLICATION NUMBER:
(8) FILING DATE:
(C) CLASSIFICATION:
(vii) PRIOR APPLICATION DATA:
(A) APPLICATION NUMBER: US 07/520,368
(B) FILING DATE: 04-MAY-1990
(viii) ATTORNEY/AGENT INFORMATION:
(A) NAME: Haley Jr., James F.
(B) P~EGISTRATION NUMBER: 27,794
(C) REFERENCE/DOCKET NUMBER: B144CIP
(ix) TELECOMMUNICATION INFORMATION:
(A) TELEPHONE: (212) 715-0600
(B) TELEFAX: (212) 715-0634
(C) TELEX: 14-8367
: - , .. .. .. :, ., ;i . , :: . . . . . . . . .
.
~ W O 91/17170 PCT/~S91/02954
- 20~3~9~
- 52 -
(2) INFORMATION FOR SEQ ID NO:1:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2588 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: dou~le
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l:
ATGGCTCCGC ACCGCCCCGC GCCCGCGCTG CTTTGCGCGC TGTCCCTGGC GCTGTGCGCG 60
CTGTCGCTGC CCGTCCGCGC GGCCACTGCG TCGCGGGGGG CGTCCCAGGC GGGGGCGCCC 120
. CAGGGGCGGG TGCCCGAGGC GCGGCCCAAC AGCATGGTGG TGGAACACCC CGAGTTCCTC 180
AAGGCAGGGA AGGAGCCTGG CCTGCAGATC TGGCGTGTGG AGAAGTTCGA TCTGGTGCCC 240
GTGCCCACCA ACCTTTATGG AGACTTCTTC ACGGGCGACG CCTACGTCAT CCTGAAGACA 300
.. GTGCAGCTGA GGAACGGAAA TCTGCAGTAT GACCTCCACT ACTGGCTGGG CAATGAGTGC 360
,~ AGCCAGGATG AGAGCGGGGC GGCCGCCATC TTTACCGTGC AGCTGGATGA CTACCTGAAC 420
. GGCCGGGCCG TGCAGCACCG TGAGGTCCAG GGCTTCGAGT CGGCCACCTT CCTAGGCTAC 480
.- TTCAAGTCTG GCCTGAAGTA CAAGAAAGGA GGTGTGGCAT CAGGATTCAA GCACGTGGTA 540
CCCAACGAGG TGGTGGTGCA GAGACTCTTC CAGGTCAAAG GGCGGCGTGT GGTCCGTGCC 600
ACCGAGGTAC CTGTGTCCTG GGAGAGCTTC AACAATGGCG ACTGCTTCAT CCTGGACCTG 660
, GGCAACAACA TCCACCAGTG GTGTGGTTCC AACAGCAATC GGTATGAAAG ACTGAAGGCC 720
ACACAGGTGT CCAAGGGCAT CCGGGACAAC GAGCGGAGTG GCCGGGCCCG AGTGCACGTG 780
; TCTGAGGAGG GCACTGAGCC CGAGGCGATG CTCCAGGTGC TGGGCCCCAA GCCGGCTCTG 840
. CCTGCAGGTA CCGAGGACAC CGCCAAGGAG GATGCGGCCA ACCGCAAGCT GGCCAAGCTC 900
. TACAAGGTCT CCAATGGTGC AGGGACCATG TCCGTCTCCC TCGTGGCTGA TGAGAACCCC 960
TTCGCCCAGG GGGCCCTGAA GTCAGAGGAC TGCTTCATCC TGGACCACGG CAAAGATGGG 1020
AAAATCTTTG TCTGGAAAGG CAAGCAGGCA AACACGGAGG AGAGGAAGGC TGCCCTCAAA 1080
ACAGCCTCTG ACTTCATCAC CAAGATGGAC TACCCCAAGC AGACTCAGGT CTCGGTCCTT 1140
CCTGAGGGCG GTGAGACCCC ACTGTTCAAG CAGTTCTTCA AGAACTGGCG GGACCCAGAC 1200
CAGACAGATG GCCTGGGCTT GTCCTACCTT TCCAGCCATA TCGCCAACGT GGAGCGGGTG 1260
:
.
. , . - - - : ' ~
.
' ' ' ~ , ~
- :. : .
WO 91/17170 PCI/US91/02~54
_ 53 _ 2~633:93
CCCTTCGACG CCGCCACCCT GCACACCTCC ACTGCCATGG CCGCCCAGCA CGGCATGGAT 1320
GACGATGGCA CAGGCCAGAA ACAGATCTGG AGAATCGAAG GTTCCAACAA GGTGCCCGTG 1380
GACCCTGCCA CATATGGACA GTTCTATGGA GGCGACAGCT ACATCATTCT GTACAACTAC 1440
CGCCATGGTG GCCGCCAGGG GCAGATAATC TATAACTGGC AGGGTGCCCA GTCTACCCAG 1500
GATGAGGTCG CTGCATCTGC CATCCTGACT GCTCAGCTGG ATGAGGAGCT GGGAGGTACC 1560
CCTGTCCAGA GCCGTGTGGT CCAAGGCAAG GAGCCCGCCC ACCTCATGAG CCTGTTTGGT 1620
GGGAAGCCCA TGATCATCTA CAAGGGCGGC ACCTCCCGCG AGGGCGGGCA GACAGCCCCT 1680
GCCAGCACCC GCCTCTTCCA GGTCCGCGCC AACAGCGCTG GAGCCACCCG GGCTGTTGAG 1740
GTATTGCCTA AGGCTGGTGC ACTGAACTCC AACGATGCCT TTGTTCTGAA AACCCCCTCA 1800
GCCGCCTACC TGTGGGTGGG TACAGGAGCC AGCGAGGCAG AGAAGACGGG GGCCCAGGAG 1860
CTGCTCAGGG TGCTGCGGGC CCAACCTGTG CAGGTGGCAG AAGGCAGCGA GCCAGATGGC 1920
TTCTGGGAGG CCCTGGGCGG GAAGGCTGCC TACCGCACAT CCCCACGGCT GAAGGACAAG 1980
AAGATGGATG CCCATCCTCC TCGCCTCTTT GCCTGCTCCA ACAAGATTGG ACGTTTTGTG 2040
ATCGAAGAGG TTCCTGGTGA GCTCATGCAG GAAGACCTGG CAACGGATGA CGTCATGCTT 2100
CTGGACACCT GGGACCAGGT CTTTGTCTGG GTTGGAAAGG ATTCTCAAGA AGAAGAAAAG 2160
ACAGAAGCCT TGACTTCTGC TAAGCGGTAC ATCGAGACGG ACCCAGCCAA TCGGGATCGG 2220
CGGACGCCCA TCACCGTGGT GAAGCAAGGC TTTGAGCCTC CCTCCTTTGT GGGCTGGTTC 2280
CTTGGCTGGG ATGATGATTA CTGGTCTGTG GACCCCTTGG ACAGGGCCAT GGCTGAGCTG 2340
GCTGCCTGAG GAGGGGCAGG GCCCACCCAT GTCACCGGTC AGTGCCTTTT GGAACTGTCC 2400
TTCCCTCAAA GAGGCCTTAG AGCGAGCAGA GCAGCTCTGC TATGAGTGTG TGTGTGTGTG 2460
TGTGTTGTTT CTTTTTTTTT TTTTTACAGT ATCCAAAAAT AGCCCTGCAA AAATTCAGAG 2520
TCCTTGCAAA ATTGTCTAAA ATGTCAGTGT TTGGGAAATT AAATCCAATA AAAACATTTT 2580
GAAGTGTG 2588
.
,. . ' ' ' ' ., '. ': ' '' ' : ' I ' . ' '', '
.-. ' ' ' . ' ~ ' " "' ' " ' '`' " ' `, ~ ' ' ~ ' ''
WO 91/17170 PCI /US91/029~4
2 ~ ~ 3 ~ 9 3
- 54 -
(2) INFORMATION FOR SEQ ID No:2:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGT~: 1377 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: double
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:
ATGAACCGGG GAGTCCCTTT TAGGCACTTG CTTCTGGTGC TGCAACTGGC GCTCCTCCCA 60
GCAGCCACTC AGGGAAAGAA AGTGGTGCTG GGCAAAAAAG GGGATACAGT GGAACTGACC 120
TGTACAGCTT CCCAGAAGAA GAGCATACAA TTCCACTGGA AAAACTCCAA CCAGATAAAG 180
ATTCTGGGAA ATCAGGGCTC CTTCTTAACT AAAGGTCCAT CCAAGCTGAA TGATCGCGCT 240
GACTCAAGAA GAAGCTTGTG GGACCAAGGA AACTTTCCCC TGATCATCAA GAATCTTAAG 300
ATAGAAGACT CAGATACTTA CATCTGTGAA GTGGAGGACC AGAAGGAGGA GGTGCAATTG 360
CTAGTGTTCG GATTGACTGC CAACTCTGAC ACCCACCTGC TTCAGGGGCA GAGCCTGACC 420
CTGACCTTGG AGAGCCCCCC TGGTAGTAGC CCCTCAGTGC AATGTAGGAG TCCAAGGGGT 480
AAAAACATAC AGGGGGGGAA GACCCTCTCC GTGTCTCAGC TGGAGCTCCA GGATAGTGGC 540
ACCTGGACAT GCACTGTCTT GCAGAACCAG AAGAAGGTGG AGTTCAAAAT AGACATCGTG 600
GTGCTAGCTT TCCAGAAGGC CTCCAGCATA GTCTACAAGA AAGAGGGGGA ACAGGTGGAG 660
TTCTCCTTCC CACTCGCCTT TACAGTTGAA AAGCTGACGG GCAGTGGCGA GCTGTGGTGG 720
CAGGCGGAGA GGGCTTCCTC CTCCAAGTCT TGGATCACCT CTGACCTGAA GAACAAGGAA 780
GTGTCTGTAA AACGGGTTAC CCAGGACCCT AAGCTCCAGA TGGGCAAGAA GCTCCCGCTC 840
CACCTCACCC TGCCCCAGGC CTTGCCTCAG TATGCTGGCT CTGGAAACCT CACCCTGGCC 900
CTTGAAGCGA AAACAGGAAA GTTGCATCAG GAAGTGAACC TGGTGGTGAT GAGAGCCACT 960
CAGCTCCAGA AAAATTTGAC CTGTGAGGTG TGGGGACCCA CCTCCCCTAA GCTGATGCTG 1020
AGCTTGAAAC TGGAGAACAA GGAGGCAAAG GTCTCGAAGC GGGAGAAGGC GGTGTGGGTG 1080
CTGAACCCTG AGGCGGGGAT GTGGCAGTGT CTGCTGAGTG ACTCGGGACA GGTCCTGCTG 1140
GAATCCAACA TCAAGGTTCT GCCCACATGG TCGACCCCGG TGCAGCCAAT.GGCCCTGATT 1200
GTGCTGGGGG GCGTCGCCGG CCTCCTGCTT TTCATTGGGC TAGGCATCTT CTTCTGTGTC 1260
.
. ~ ............................ .
, ~ . ' ~ . ~ . .
:: ~
W O 91/17170 PCT/US9l/02954
:: 20~3533
.. ` ` ``;` ~
- 55 -
AGGTGCCGGC ACCGAAGGCG CCAAGCAGAG CGGATGTCTC AGATCAAGAG ACTCCGCAGT 1320
:` GAGAAGAAGA CCTGCCAGTG CCCTCACCGG TTTCAGAAGA CATGTAGCCC CATTTGA 1377
~'
.;
:; :
., :
''~. '
.
~.
.. ~ ' '
"
W 0 91/17170 PCT/US91/02954
2l~3~93
- 56 -
(2) INFORMATIOW FOR SEQ ID NO:3:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 37 baae pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: ~ingle
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:
AGATCTACGG GGGCGTGGCA TCAGGATTCA AGCACGT 37
~: . . ., - , :
,: : .
' ' ' ' ~ . ' ' `, ' , .
WO 91/]71'70 PCI/U!i91/029~4
20~3~9~
- 57 -
.- (2) INFOR~ATION FOR SEQ ID NO:4:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 27 base pair~
: (B) TYPE: nucleic acid
(C) STRANDEDNESS: ~ingle
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:
GAATTCTTAG GCACGGACC~ CACGCCG 27
. . .
'' .. ,:
'
'~
;'
WO 91/17170 PCl/US91/02954
-. 2~-63~93
. - 58 -
(2) INFORMATION FOR SEQ ID NO:5:
. (i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 25 ba~e pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: ~ingle
`, ( D ) TOPOLOGY: 1 inear
.-.
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:
GGGGTGTTGA TAGTAAGATC TTGCA 25
~ : "
'~ :
. .
:` :
.~:
'
.
'
.'
.
" ~ :
WO 91/17170 PCl/US91/02954
2063~93
- 59 -
(2) INFO~MATION FOR SEQ ID NO:6:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 17 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: single
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:
AGATCTTACT ATCAAGA 17
., .. , -, . - - , . . ... . . . . . . . . .
', , ' -.,', ~
, ~ .- , , ' , , ,
WO 91/17170 PCl/US91/(~2954
.. . .
2~3~93
- 60 -
(2) INFORMATION FOR SEQ ID NO:7:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base pairs
(B) TYPE: nucleic acid
, (C) ST~ANDEDNESS: ~ingle
(D) TOPOLOGY: linear
,
: (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:
ATCCCTGTCC GTAGAAGCTT ATCGAT 26
'~
" .
. .
.~' ' ' .
.'
, .,
'~ , ~ ~ , - : `
'
,
WO 91/17170 PCT/US9]/02954
20~3~93
. . ; ,- j ,.
- 61 -
(2) INFORMATION FOR SEQ ID NO:8:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 26 base paira
(B) TYPE: nucleic acid
(C) STRANDEDNESS: ~ingle
(D) TOPOLOGY: li~ear
~ .
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:
ATCGATAAGC TTCTACGGAC AGGGAT 26
.' ' .
'.: ', .
' '
,, .
: . ~, ..... . .
WO 91/17170 PCI/US~1/02954
2~63~93
- 62 -
(2) INFORMATION FOR SEQ ID NO:9:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 46 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: sin~le
(D) TOPOLOGY: linear . .
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:
GGAGGACCAG AAAGAAGAAG TTCAGCTGCT GGTTTTCGGA TTGACT 46
' ~: '
, :
: .
W 0 91/17170 PCr/US91/02954
2063~93
~ r ~
- 63 -
. (2) INFORMATION FOR SEQ ID NO:l0:
-- (i) SEQUENCE CHARACTERISTICS:
- (A) LENGTH: 654 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: double .
(D) TOPOLOGY: linear
;, . ' (xi) SEQUENCE DESCRIPTION: SEQ ID NO:l0:
ATGAAAAAAG TAGTACTGGG C~A~AAGGG GATACAGTGG AACTGACCTG TACAGCTTCC 60
CAGAAGAAGA GCATACAATT CCACTGGAAA AACTCCAACC AGATAAAGAT TCTGGGAAAT 120
CAGGGCTCCT TCTTAACTAA AGGTCCATCC AAGCTGAATG ATCGCGCTGA CTCAAGAAGA 180
AGCTTGTGGG ACCAAGGAAA CTTTCCCCTG ATCATCAAGA ATCTTAAGAT AGAAGACTCA 240
:~ GATACTTACA TCTGTGAAGT GGAGGACCAG AAAGAAGAAG TTCAGCTGCT GGTTTTCGGA 300
TTGACTGCCA ACTCTGACAC CCACCTGCTT CAGGGGCAGA GCCTGACCCT GACCTTGGAG 360
AGCCCCCCTG GTAGTAGCCC CTCAGTGCAA TGTAGGAGTC CAAGGGGTAA AAACATACAG 420
, GGGGGGAAGA CCCTCTCCGT GTCTCAGCTG GAGCTCCAGG ATAGTGGCAC CTGGACATGC 480
ACTGTCTTGC AGAACCAGAA GAAGGTGGAG TTCAAAATAG ACATCGTGGT GCTAGCTTTC 540
CAGAAGGGGA AGATCTACGG GGGCGTGGCA TCAGGATTCA AGCACGTGGT ACCCAACGAG 600
. GTGG~GGTGC AGAGACTCTT CCAGGTCAAA GGGCGGCGTG TGGTCCGTGC CTAA 654
,'''' "
, ~ .
` ~'.. ":~.'
,: :.
, .:
... .
.
. ' , , ~,:,.:, : ., . ':,.':: ' ' .' ~ , ' : ~ : :'
WO 9! /17170 PCr/US91/02954
~l~63~3
- 64 -
(2) INFORMATION FOR SEQ ID NO:11:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 4309 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: double
(D) TOPOLOGY: circular
. (xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll:
GAATTCTTAC ACTTAGTTAA ATTGCTAACT TTATAGATTA CAAAACTTAG GAAATCGATT 60
TGGATGAAAA AAGTAGTACT GGGCAAAAAA GGGGATACAG TGGAACTGAC CTGTACAGCT 120
TCCCAGAAGA AGAGCATACA ATTCCACTGG AAAAACTCCA ACCAGATAAA GATTCTGGGA 180
AATCAGGGCT CCTTCTTAAC TAAAGGTCCA TCCAAGCTGA ATGATCGCGC TGACTCAAGA 240
AGAAGCTTGT GGGACCAAGG AAACTTTCCC CTGATCATCA AGAATCTTAA GATAGAAGAC 300
TCAGATACTT ACATCTGTGA AGTGGAGGAC CAGAAGGAGG AGGTGCAATT GCTAGTGTTC 360
GGATTGACTG CCAACTCTGA CACCCACCTG CTTCAGGGGT GATAGTAAGA TCCTGCAGCC 420
CAGCTTGGGG ACCCTAGAGG TCCCCTTTTT TATTTTGAAT TGGGAGATCC CAATTCTCAT 480
GTTTGACAGC TTATCATCGA TAAGCTAGCT TTAATGCGGT AGTTTATCAC AGTTAAATTG 540
CTAACGCAGT CAGGCACCGT GTATGAAATC TAACAATGCG CTCATCGTCA TCCTCGGCAC 600
CGTCACCCTG GATGCTGTAG GCATAGGCTT GGTTATGCCG GTACTGCCGG GCCTCTTGCG 660
GGATATCGTC CATTCCGACA GCATCGCCAG TCACTATGGC GTGCTGCTAG CGCTATATGC 720
GTTGATGCAA TTTCTATGCG CACCCGTTCT CGGAGCACTG TCCGACCGCT TTGGCCGCCG 780
CCCAGTCCTG CTCGCTTCGC TACTTGGAGC CACTATCGAC TACGCGATCA TGGCGACCAC 840
ACCCGTCCTG TGGATTCTCT ACGCCGGACG CATCGTGGCC GGCATCACCG GCGCCACAGG 900
TGCGGTTGCT GGCGCCTATA TCGCCGACAT CACCGATGGG GAAGATCGGG CTCGCCACTT 960
CGGGCTCATG AGCGCTTGTT TCGGCGTGGG TATGGTGGCA GGCCCCGTGG CCGGGGGACT 1020
GTTGGGCGCC ATCTCCTTGC ACGCACCATT CCTTGCGGCG GCGGTGCTCA ACGGCCTCAA 1080
CCTACTACTG GGCTGCTTCC TAATGCAGGA GTCGCATAAG GGAGAGCGTC GTCCGATGCC 1140
CTTGAGAGCC TTCAACCCAG TCAGCTCCTT CCGGTGGGCG CGGGGCATGA CTATCGTCGC 1200
CGCACTTATG ACTGTCTTCT TTATCATGCA ACTCGTAGGA CAGGTGCCGG CAGCGCTCTG 1260
,: , . .
W O 91/17170 PC~/US91/029~4
2~3~93
- 65 -
GGTCATTTTC GGCGAGGACC GCTTTCGCTG GAGCGCGACG ATGATCGGCC TGTCGCTTGC 1320
GGTATTCGGA ATCTTGCACG CCCTCGCTCA AGCCTTCGTC ACTGGTCCCG CCACCAAACG 1380
TTTCGGCGAG AAGCAGGCCA TTATCGCCGG CATGGCGGCC GACGCGCTGG GCTACGTCTT 1440
GCTGGCGTTC GCGACGCGAG GCTGGATGGC CTTCCCCATT ATGATTCTTC TCGCTTCCGG 1500
CGGCATCGGG ATGCCCGCGT TGCAGGCCAT GCTGTCCAGG CAGGTAGATG ACGACCATCA 1560
GGGACAGCTT CAAGGATCGC TCGCGGCTCT TACCAGCCTA ACTTCGATCA CTGGACCGCT 1620
GATCGTCACG GCGATTTATG CCGCCTCGGC GAGCACATGG AACGGGTTGG CATGGATTGT 1680
AGGCGCCGCC CTATACCTTG TCTGCCTCCC CGCGTTGCGT CGCGGTGCAT GGAGCCGGGC 1740
CACCTCGACC TGAATGGAAG CCGGCGGCAC CTCGCTAACG GATTCACCAC TCCAAGAATT 1800
GGAGCCAATC AATTCTTGCG GAGAACTGTG AATGCGCAAA CCAACCCTTG GCAGAACATA 1860
TCCATCGCGT CCGCCATCTC CAGCAGCCGC ACGCGGCGCA TCTCGGGGGA TGATCAGCTG 1920
CCTCGCGCGT TTCGGTGATG ACGGTGAAAA CCTCTGACAC ATGCAGCTCC CGGAGACGGT 1980
CACAGCTTGT CTGTAAGCGG ATGCCGGGAG CAGACAAGCC CGTCAGGGCG CGTCAGCGGG 2040
TGTTGGCGGG TGTCGGGGCG CAGCCATGAC CCAGTCACGT AGCGATAGCG GAGTGTATAC 2100
TGGCTTAACT ATGCGGCATC AGAGCAGATT GTACTGAGAG TGCACCATAT GCGGTGTGAA 2160
ATACCGCACA GATGCGTAAG GAGAAAATAC CGCATCAGGC GCTCTTCCGC TTCCTCGCTC 2220
ACTGACTCGC TGCGCTCGGT CGTTCGGCTG CGGCGAGCGG TATCAGCTCA CTCAAAGGCG 2280
GTAATACGGT TATCCACAGA ATCAGGGGAT AACGCAGGAA AGAACATGTG AGCAAAAGGC 2340
CAGCAAAAGG CCAGGAACCG TAAAAAGGCC GCGTTGCTGG CGTTTTTCCA TAGGCTCCGC 2400
CCCCCTGACG AGCATCACAA AAATCGACGC TCAAGTCAGA GGTGGCGAAA CCCGACAGGA 2460
CTATAAAGAT ACCAGGCGTT TCCCCCTGGA AGCTCCCTCG TGCGCTCTCC TGTTCCGACC 2520
CTGCCGCTTA CCGGATACCT GTCCGCCTTT CTCCCTTCGG GAAGCGTGGC GCTTTCTCAA 2580
TGCTCACGCT GTAGGTATCT CAGTTCGGTG TAGGTCGTTC GCTCCAAGCT GGGCTGTGTG 2640
CACGAACCCC CCGTTCAGCC CGACCGCTGC GCCTTATCCG GTAACTATCG TCTTGAGTCC 2700
AACCCGGTAA GACACGACTT ATCGCCACTG GCAGCAGCCA CTGGTAACAG GATTAGCAGA 2760
GCGAGGTATG TAGGCGGTGC TACAGAGTTC TTGAAGTGGT GGCCTAACTA CGGCTACACT 2820
AGAAGGACAG TATTTGGTAT CTGCGCTCTG CTGAAGCCAG TTACCTTCGG AAAAAGAGTT 2880
' ~.
-
WO 91/17170 PCT/U591/02954
2063593
- 66 -
GGTAGCTCTT GATCCGGCAA ACAAACCACC GCTGGTAGCG GTGGTTTTTT TGTTTGCAAG 2940
CAGCAGATTA CGCGCAGAAA AAAAGGATCT CAAGAAGATC CTTTGATCTT TTCTACGGGG 3000
TCTGACGCTC AGTGGAACGA AAACTCACGT TAAGGGATTT TGGTCATGAG ATTATCAAAA 3060
AGGATCTTCA CCTAGATCCT TTTCAGATCT CCCGATCTTT AGCTGTCTTG GTTTGCCCAA 3120
AGCGCATTGC ATAATCTTTC AGGGTTATGC GTTGTTCCAT ACAACCTCCT TAGTACATGC 3180
AACCATTATC ACCGCCAGAG GTAAAATAGT CAACACGCAC GGTGTTAGAT ATTTATCCCT 3240
TGCGGTGATA GATTTAACGT ATGAGCACAA AAAAGAAACC ATTAACACAA GAGCAGCTTG 3300
AGGACGCACG TCGCCTTAAA GCAATTTATG AAAAAAAGAA AAATGAACTT GGCTTATCCC 3360
AGGAATCTGT CGCAGACAAG ATGGGGATGG GGCAGTCAGG CGTTGGTGCT TTATTTAATG 3420
GCATCAATGC ATTAAATGCT TATAACGCCG CATTGCTTAC AAAAATTCTC AAAGTTAGCG 3480
TTGAAGAATT TAGCCCTTCA ATCGCCAGAG AAATCTACGA GATGTATGAA GCGGTTAGTA 3540
TGCAGCCGTC ACTTAGAAGT GAGTATGAGT ACCCTGTTTT TTCTCATGTT CAGGCAGGGA 3600
TGTTCTCACC TAAGCTTAGA ACCTTTACCA AAGGTGATGC GGAGAGATGG GTAAGCACAA 3660
CCAAAAAAGC CAGTGATTCT GCATTCTGGC TTGAGGTTGA AGGTAATTCC ATGACCGCAC 3720
CAACAGGCTC CAAGCCAAGC TTTCCTGACG GAATGTTAAT TCTCGTTGAC CCTGAGCAGG 3780
CTGTTGAGCC AGGTGATTTC TGCATAGCCA GACTTGGGGG TGATGAGTTT ACCTTCAAGA 3840
` AACTAATTAG GGATAGCGGT CAGGTGTTTT TACAACCACT AAACCCACAG TACCCAATGA 3900
TCCCATGCAA TGAGAGTTGT TCCGTTGTGG GGAAAGTTAT CGCTAGTCAG TGGCCTGAAG 3960
AGACGTTTGG CTGATCGGCA AGGTGTTCTG GTCGGCGCAT AGCTGATAAC AATTGAGCAA 4020
GAATCTTCAT CGGGGCTGCA GCCCACGATG CGTCCGGCGT AGAGGATCTC TCACCTACCA 4080
AACAATGCCC CCCTGCAAAA AATAAATTCA TATAAAAAAC ATACAGATAA CCATCTGCGG 4140
TGATAAATTA TCTCTGGCGG TGTTGACATA AATACCACTG GCGGTGATAC TGAGCACATC 4200
AGCAGGACGC ACTGACCACC ATGAAGGTGA CGCTCTTAAA ATTAAGCCCT GAAGAAGGGC 4260
AGCATTCAAA GCAGAAGGCT TTGGGGTGTG TGATACGAAA CGAAGCATT 4309
- .`
.
., :
Wo 91/17170 PCr/US9~ 2954
2063593
- 67 -
(2) INFORMATION FOR SEQ ID NO:12:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 6151 base pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: double
(D) TOPOLOGY: circular
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:
GAATTAATTC CAGCTTGCTG TGGAATGTGT GTCAGTTAGG GTGTGGAAAG TCCCCAGGCT 60
CCCCAGCAGG CAGAAGTATG CAAAGCATGC ATCTCAATTA GTCAGCAACC AGGTGTGGAA 120
AGTCCCCAGG CTCCCCAGCA GGCAGAAGTA TGCAAAGCAT GCATCTCAAT TAGTCAGCAA 180
. .
CCATAGTCCC GCCCCTAACT CCGCCCATCC CGCCCCTAAC TCCGCCCAGT TCCGCCCATT 240
CTCCGCCCCA TGGCTGACTA ATTTTTTTTA TTTATGCAGA GGCCGAGGCC GCCTCGGCCT 300
CTGAGCTATT CCAGAAGTAG TGAGGAGGCT TTTTTGGAGG GGTCCTCCTC GTATAGAAAC 360
. . , ,~
TCGGACCACT CTGAGACGAA GGCTCGCGTC CAGGCCAGCA CGAAGGAGGC TAAGTGGGAG 420
GGGTAGCGGT CGTTGTCCAC TAGGGGGTCC ACTCGCTCCA GGGTGTGAAG ACACATGTCG 480
CCCTCTTCGG CATCAAGGAA GGTGATTGGT TTATAGGTGT AGGCCACGTG ACCGGGTGTT 540
- CCTGAAGGGG GGCTATAAAA GGGGGTGGGG GCGCGTTCGT CCTCACTCTC TTCCGCATCG 600
CTGTCTGCGA GGGCCAGCTG TTGGGCTCGC GGTTGAGGAC AAACTCTTCG CGGTCTTTCC 660
AGTACTCTTG GATCGGAAAC CCGTCGGCCT CCGAACGGTA CTCCGCCACC GAGGGACCTG 720
AGCGAGTCCG CATCGACCGG ATCGGAAAAC CTCTCGAGAA AGGCGTCTAA CCAGTCACAG 780
TCGCAAGGTA GGCTGAGCAC CGTGGCGGGC GGCAGCGGGT GGCGGTCGGG GTTGTTTCTG 840
GCGGAGGTGC TGCTGATGAT GTAATTAAAG TAGGCGGTCT TGAGACGGCG GATGGTCGAG 900
GTGAGGTGTG GCAGGCTTGA GATCGATCTG GCCATACACT TGAGTGACAA TGACATCCAC 960
TTTGCCTTTC TCTCCACAGG TGTCCACTCC CAGGTCCAAC TGGATCCAAG CTTCGACTCG 1020
AGGAATTCCC CGAAGGAACA AAGCACCCTC CCCACTGGGC TCCTGGTTGC AGAGCTCCAA 1080
: GTCCTCACAC AGATACGCCT GTTTGAGAAG CAGCGGGCAA GAAAGACGCA AGCCCAGAGG 1140
CCCTGCCATT TCTGTGGGCT CAGGTCCCTA CTGGCTCAGG CCCCTGCCTC CCTCGGCAAG 1200
GCCACAATGA ACCGGGGAGT CCCTTTTAGG CACTTGCTTC TGGTGCTGCA ACTGGCGCTC 1260
' , . . : . :': ,: ~ .
- . '' '' ' , ',-., .' '', , : ' ' :,
W ~ 91/17170 PCT/US91/02~
2~S~5!~,
- 68 -
CTCCCAGCAG CCACTCAGGG AAAGAAAGTG GTGCTGGGCA AAAAAGGGGA TACAGTGGAA 1320
CTGACCTGTA CAGCTTCCCA GAAGAAGAGC ATACAATTCC ACTGGAAAAA CTCCAACCAG 1380
ATAAAGATTC TGGGAAATCA GGGCTCCTTC TTAACTAAAG GTCCATCCAA GCTGAATGAT 1440
CGCGCTGACT CAAGAAGAAG CTTGTGGGAC CAAGGAAACT TTCCCCTGAT CATCAAGAAT 1500
CTTAAGATAG AAGACTCAGA TACTTACATC TGTGAAGTGG AGGACCAGAA GGAGGAGGTG 1560
CAATTGCTAG TGTTCGGATT GACTGCCAAC TCTGACACCC ACCTGCTTCA GGGGCAGAGC 1620
CTGACCCTGA CCTTGGAGAG CCCCCCTGGT AGTAGCCCCT CAGTGCAATG TAGGAGTCCA 1680
AGGGGTAAAA ACATACAGGG GGGGAAGACC CTCTCCGTGT CTCAGCTGGA GCTCCAGGAT 1740
AGTGGCACCT GGACATGCAC TGTCTTGCAG AACCAGAAGA AGGTGGAGTT CAAAATAGAC 1800
ATCGTGGTGC TAGCTTTCCA GAAGGCCTCC AGCATAGTCT ATAAGAAAGA GGGGGAACAG 1860
GTGGAGTTCT CCTTCCCACT CGCCTTTACA GTTGAAAAGC TGACGGGCAG TGGCGAGCTG 1920
TGGTGGCAGG CGGAGAGGGC TTCCTCCTCC AAGTCTTGGA TCACCTTTGA CCTGAAGAAC 1980
AAGGAAGTGT CTGTAAAACG GGTTACCCAG GACCCTAAGC TCCAGATGGG CAAGAAGCTC 2040
CCGCTCCACC TCACCCTGCC CCAGGCCTTG CCTCAGTATG CTGGCTCTGG AAACCTCACC 2100
CTGGCCCTTG AAGCGAAAAC AGGAAAGTTG CATCAGGAAG TGAACCTGGT GGTGATGAGA 2160
GCCACTCAGC TCCAGAAAAA TTTGACCTGT GAGGTGTGGG GACCCACCTC CCCTAAGCTG 2220
ATGCTGAGTT TGAAACTGGA GAACAAGGAG GCAAAGGTCT CGAAGCGGGA GAAGGCGGTG 2280
TGGGTGCTGA ACCCTGAGGC GGGGATGTGG CAGTGTCTGC TGAGTGACTC GGGACAGGTC 2340
CTGCTGGAAT CCAACATCAA GGTTCTGCCC ACATGGTCGA CCCCGGTGCA GCCAATGGCC 2400
CTGATTTGAG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT GTGACATAAT TGGACAAACT 2460
ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT TTAAGTGTAT AATGTGTTAA 2520
ACTACTGATT CTAATTGTTT GTGTATTTTA GATTCCAACC TATGGAACTG ATGAATGGGA 2580
GCAGTGGTGG AATGCCTTTA ATGAGGAAAA CCTGTTTTGC TCAGAAGAAA TGCCATCTAG 2640
TGATGATGAG GCTACTGCTG ACTCTCAACA TTCTACTCCT CCAAAAAAGA AGAGAAAGGT 2700
AGAAGACCCC AAGGACTTTC CTTCAGAATT GCTAAGTTTT TTGAGTCATG CTGTGTTTAG 2760
TAATAGAACT CTTGCTTGCT TTGCTATTTA CACCACAAAG GAAAAAGCTG CACTGCTATA 2820
CAAGAAAATT ATGGAAAAAT ATTCTGTAAC CTTTATAAGT AGGCATAACA GTTATAATCA 2880
:
:
,:
' ',
.
.
- ~
W O 9ltl7170 PCT/US91/02954
20635~3
TAACATACTG TTTTTTCTTA CTCCACACAG GCATAGAGTG TCTGCTATTA ATAACTATGC 2940
TCAAAAATTG TGTACCTTTA GCTTTTTAAT TTGTAAAGGG GTTAATAAGG AATATTTGAT 3000
GTATAGTGCC TTGACTAGAG ATCATAATCA GCCATACCAC ATTTGTAGAG GTTTTACTTG 3060
CTTTAAAAAA CCTCCCACAC CTCCCCCTGA ACCTGAAACA TAAAATGAAT GCAATTGTTG 3120
TTGTTAACTT GTTTATTGCA GCTTATAATG GTTACAAATA AAGCAATAGC ATCACAAATT 3180
TCACAAATAA AGCATTTTTT TCACTGCATT CTAGTTGTGG TTTGTCCAAA CTCATCAATG 3240
TATCTTATCA TGTCTGGATC CTCTACGCCG GACGCATCGT GGCCGGCATC ACCGGCGCCA 3300
CAGGTGCGGT TGCTGGCGCC TATATCGCCG ACATCACCGA TGGGGAAGAT CGGGCTCGCC 3360
ACTTCGGGCT CATGAGCGCT TGTTTCGGCG TGGGTATGGT GGCAGGCCCG TGGCCGGGGG 3420
ACTGTTGGGC GCCATCTCCT TGCATGCACC ATTCCTTGCG GCGGCGGTGC TCAACGGCCT 3480
CAACCTACTA CTGGGCTGCT TCCTAATGCA GGAGTCGCAT AAGGGAGAGC GTCGACCGAT 3540
GCCCTTGAGA GCCTTCAACC CAGTCAGCTC CTTCCGGTGG GCGCGGGGCA TGACTATCGT 3600
CGCCGCACTT ATGACTGTCT TCTTTATCAT GCAACTCGTA GGACAGGTGC CGGCAGCGCT 3660
CTGGGTCATT TTCGGCGAGG ACCGCTTTCG CTGGAGCGCG ACGATGATCG GCCTGTCGCT 3720
TGCGGTATTC GGAATCTTGC ACGCCCTCGC TCAAGCCTTC GTCACTGGTC CCGCCACCAA 3780
ACGTTTCGGC GAGAAGCAGG CCATTATCGC CGGCATGGCG GCCGACGCGC TGGGCTACGT 3840
CTTGCTGGCG TTCGCGACGC GAGGCTGGAT GGCCTTCCCC ATTATGATTC TTCTCGCTTC 3900
CGGCGGCATC GGGATGCCCG CGTTGCAGGC CATGCTGTCC AGGCAGGTAG ATGACGACCA 3960
TCAGGGACAG CTTCAAGGAT CGCTCGCGGC TCTTACCAGC CTAACTTCGA TCACTGGACC 4020
GCTGATCGTC ACGGCGATTT ATGCCGCCTC GGCGAGCACA TGGAACGGGT TGGCATGGAT 4080
TGTAGGCGCC GCCCTATACC TTGTCTGCCT CCCCGCGTTG CGTCGCGGTG CATGGAGCCG 4140
GGCCACCTCG ACCTGAATGG AAGCCGGCGG CACCTCGCTA ACGGATTCAC CACTCCAAGA 4200
ATTGGAGCCA ATCAATTCTT GCGGAGAACT GTGAATGCGC AAACCAACCC TTGGCAGAAC 4260 --
ATATCCATCG CGTCCGCCAT CTCCAGCAGC CGCACGCGGC GCATCTCGGG CCGCGTTGCT 4320
GGCGTTTTTC CATAGGCTCC GCCCCCCTGA CGAGCATCAC AAAAATCGAC GCTCAAGTCA 4380
GAGGTGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG TTTCCCCCTG GAAGCTCCCT 4440 ~ .
CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT TACCGGATAC CTGTCCGCCT TTCTCCCTTC 4500
'. ' ' ' '''' ', ' .
'"'' .:' ~ -
':
WO 91/17170 PCI/US91/02954
2 ~ ~ 3 ~ 9 ~ - 70 -
GGGAAGCGTG GCGCTTTCTC AATGCTCACG CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT 4560
TCGCTCCAAG CTGGGCTGTG TGCACGAACC CCCCGTTCAG CCCGACCGCT GCGCCTTATC 4620
CGGTAACTAT CGTCTTGAGT CCAACCCGGT AAGACACGAC TTATCGCCAC TGGCAGCAGC 4680
CACTGGTAAC AGGATTAGCA GAGCGAGGTA TGTAGGCGGT GCTACAGAGT TCTTGAAGTG 4740
GTGGCCTAAC TACGGCTACA CTAGAAGGAC AGTATTTGGT ATCTGCGCTC TGCTGAAGCC 4800
AGTTACCTTC GGAAAAAGAG TTGGTAGCTC TTGATCCGGC AAACAAACCA CCGCTGGTAG 4860
CGGTGGTTTT TTTGTTTGCA AGCAGCAGAT TACGCGCAGA AAAAAAGGAT CTCAAGAAGA 4920
TCCTTTGATC TTTTCTACGG GGTCTGACGC TCAGTGGAAC GAAAACTCAC GTTAAGGGAT 4980
TTTGGTCATG AGATTATCAA AAAGGATCTT CACCTAGATC CTTTTAAATT AAAAATGAAG 5040
TTTTAAATCA ATCTAAAGTA TATATGAGTA AACTTGGTCT GACAGTTACC AATGCTTAAT 5100
CAGTGAGGCA CCTATCTCAG CGATCTGTCT ATTTCGTTCA TCCATAGTTG CCTGACTCCC 5160
CGTCGTGTAG ATAACTACGA TACGGGAGGG CTTACCATCT GGCCCCAGTG CTGCAATGAT 5220
ACCGCGAGAC CCACGCTCAC CGGCTCCAGA TTTATCAGCA ATAAACCAGC CAGCCGGAAG 5280
GGCCGAGCGC AGAAGTGGTC CTGCAACTTT ATCCGCCTCC ATCCAGTCTA TTAATTGTTG 5340
CCGGGAAGCT AGAGTAAGTA GTTCGCCAGT TAATAGTTTG CGCAACGTTG TTGCCATTGC 5400
TGCAGGCATC GTGGTGTCAC GCTCGTCGTT TGGTATGGCT TCATTCAGCT CCGGTTCCCA 5460
ACGATCAAGG CGAGTTACAT GATCCCCCAT GTTGTGCAAA AAAGCGGTTA GCTCCTTCGG 5520
TCCTCCGATC GTTGTCAGAA GTAAGTTGGC CGCAGTGTTA TCACTCATGG TTATGGCAGC 5580
ACTGCATAAT TCTCTTACTG TCATGCCATC CGTAAGATGC TTTTCTGTGA CTGGTGAGTA 5640
CTCAACCAAG TCATTCTGAG AATAGTGTAT GCGGCGACCG AGTTGCTCTT GCCCGGCGTC 5700
AACACGGGAT AATACCGCGC CACATAGCAG AACTTTAAAA GTGCTCATCA TTGGAAAACG 5760
TTCTTCGGGG CGAAAACTCT CAAGGATCTT ACCGCTGTTG AGATCCAGTT CGATGTAACC 5820
CACTCGTGCA CCCAACTGAT CTTCAGCATC TTTTACTTTC ACCAGCGTTT CTGGGTGAGC 5880
AAAAACAGGA AGGCAAAATG CCGCAAAAAA GGGAATAAGG GCGACACGGA AATGTTGAAT 5940
ACTCATACTC TTCCTTTTTC AATATTATTG AAGCATTTAT CAGGGTTATT GTCTCATGAG 6000
CGGATACATA TTTGAATGTA TTTAGAAAAA TAAACAAATA GGGGTTCCGC GCACATTTCC 6060
CCGAAAAGTG CCACCTGACG TCTAAGAAAC CATTATTATC ATGACATTAA CCTATAAAAA 6120
;, : - - .` : :
-- ~ '' ~ ' ' ,:
,:- :- ` . . :
.- ,, :
WO 91/17170 PCI'/US91/02954
2063~93
.
. 71
.: TAGGCGTATC ACGAGGCCCT TTCGTCTTCA A 6151
., --~ .
., ~ '''.
.:
~.
.~ .
.' .
~;
'
:
' ':
: :
- . . . . ,. ' ..
. ' , ' ' ' . ' ~ , -, . ~: .
W O 91/17170 PCT/US91/~2954
2 0 6 3 ~ 9 3
- 72 -
(2) INFORMATION FOR SEQ ID NO:13:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 5727 ba~e pairs
(B) TYPE: nucleic acid
(C) STRANDEDNESS: double
(D) TOPOLOGY: circular
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:
GAATTCTTAC ACTTAGTTAA ATTGCTAACT TTATAGATTA CAAAACTTAG GAAATCGATT 60
TGGATGAAAA AAGTAGTACT GGGCAAAAAA GGGGATACAG TGGAACTGAC CTGTACAGCT 120
TCCCAGAAGA AGAGCATACA ATTCCACTGG AAAAACTCCA ACCAGATAAA GATTCTGGGA 180
AATCAGGGCT CCTTCTTAAC TAAAGGTCCA TCCAAGCTGA ATGATCGCGC TGACTCAAGA 240
AGAAGCTTGT GGGACCAAGG AAACTTTCCC CTGATCATCA AGAATCTTAA GATAGAAGAC 300
TCAGATACTT ACATCTGTGA AGTGGAGGAC CAGAAGGAGG AGGTGCAATT GCTAGTGTTC 360
GGATTGACTG CCAACTCTGA CACCCACCTG CTTCAGGGGC AGAGCCTGAC CCTGACCTTG 420
GAGAGCCCCC CTGGTAGTAG CCCCTCAGTG CAATGTAGGA GTCCAAGGGG TAAAAACATA 480
CAGGGGGGGA AGACCCTCTC CGTGTCTCAG CTGGAGCTCC AGGATAGTGG CACCTGGACA 540
TGCACTGTCT TGCAGAACCA GAAGAAGGTG GAGTTCAAAA TAGACATCGT GGTGCTAGCT 600
TTCCAGAAGG GGAAGATCTT TCCCGAGGGC GGCAGCCTGG CCGCGCTGAC CGCGCACCAG 660
GCTTGCCACC TGCCGCTGGA GACTTTCACC CGTCATCGCC AGCCGCGCGG CTGGGAACAA 720
CTGGAGCAGT GCGGCTATCC GGTGCAGCGG CTGGTCGCCC TCTACCTGGC GGCGCGGCTG 780
TCGTGGAACC AGGTCGACCA GGTGATCCGC AACGCCCTGG CCAGCCCCGG CAGCGGCGGC 8g0
GACCTGGGCG AAGCGATCCG CGAGCAGCCG GAGCAGGCCC GTCTGGCCCT GACCCTGGCC 900
GCCGCCGAGA GCGAGCGCTT CGTCCGGCAG GGCACCGGCA ACGACGAGGC CGGCGCGGCC 960
AACGCCGACG TGGTGAGCCT GACCTGCCCG GTCGCCGCCG GTGAATGCGC GGGCCCGGCG 1020
GACAGCGGCG ACGCCCTGCT GGAGCGCAAC TATCCCACTG GCGCGGAGTT CCTCGGCGAC 1080
GGCGGCGACG TCAGCTTCAG CACCCGCGGC ACGCAGAACT GGACGGTGGA GCGGCTGCTC 1140
CAGGCGCACC GCCAACTGGA GGAGCGCGGC TATGTGTTCG TCGGCTACCA CGGCACCTTC 1200
CTCGAAGCGG CGCAAAGCAT CGTCTTCGGC GGGGTGCGCG CGCGCAGCCA GGACCTCGAC 1260
'
:, . '. , , ' '' '~' ;, ~ . ' '' ' ' ' ' '
: ' . ' : ' ,. , ' . :. : , ~
WO 91/17170 PCI/US91/02954
2063~3
- 73 -
GCGATCTGGC GCGGTTTCTA TATCGCCGGC GATCCGGCGC TGGCCTACGG CTACGCCCAG 1320 :-
GACCAGGAAC CCGACGCACG CGGCCGGATC CGCAACGGTG CCCTGCTGCG GGTCTATGTG 1380
CCGCGCTCGA GCCTGCCGGG CTTCTACCGC ACCAGCCTGA CCCTGGCCGC GCCGGAGGCG 1440
GCGGGCGAGG TCGAACGGCT GATCGGCCAT CCGCTGCCGC TGCGCCTGGA CGCCATCACC 1500
GGCCCCGAGG AGGAAGGCGG GCGCCTGGAG ACCATTCTCG GCTGGCCGCT GGCCGAGCGC 1560
ACCGTGGTGA TTCCCTCGGC GATCCCCACC GACCCGCGCA ACGTCGGCGG CGACCTCGAC 1620
CCGTCCAGCA TCCCCGACAA GGAACAGGCG ATCAGCGCCC TGCCGGACTA CGCCAGCCAG 1680
CCCGGCAAAC CGCCGCGCGA GGACCTGAAG TAACTGCCGC GACCGGCCGG CTCCCTTCGC 1740
AGGAGCCGGC CTTCTCGGGG CCTGGCCATA CATCAGGTTT TCCTGATGCC AGCCCAATCG 1800
AATATGAATT CTCATCGATT TCCATGGGAT CCTGCAGCCC AGCTTGGGGA CCCTAGAGGT 1860
CCCCTTTTTT ATTTTTTGAA TTGGGAGATC CAATTCTCAT GTTTGACAGC TTATCATCGA 1920
AGCTAGCTTT AATGCGGTAG TTTATCACAG TTAAATTGCT AACGCAGTCA GGCACCGTGT 1980
ATGAAATCTA ACAATGCGCT CATCGTCATC CTCGGCACCG TCACCCTGGA TGCTGTAGGC 2040
ATAGGCTTGG TTATGCCGGT ACTGCCGGGC CTCTTGCGGG ATATCGTCCA TTCCGACAGC 2100
ATCGCCAGTC ACTATGGCGT GCTGCTAGCG CTATATGCGT TGATGCAATT TCTATGCGCA 2160
CCCGTTCTCG GAGCACTGTC CGACCGCTTT GGCCGCCGCC CAGTCCTGCT CGCTTCGCTA 2220
CTTGGAGCCA CTATCGACTA CGCGATCATG GCGACCACAC CCGTCCTGTG GATTCTCTAC 2280
GCCGGACGCA TCGTGGCCGG CATCACCGGC GCCACAGGTG CGGTTGCTGG CGCCTATATC 2340
GCCGACATCA CCGATGGGGA AGATCGGGCT CGCCACTTCG GGCTCATGAG CGCTTGTTTC 2400
GGCGTGGGTA TGGTGGCAGG CCCCGTGGCC GGGGGACTGT TGGGCGCCAT CTCCTTGCAC 2460
GCACCATTCC TTGCGGCGGC GGTGCTCAAC GGCCTCAACC TACTACTGGG CTGCTTCCTA 2520
ATGCAGGAGT CGCATAAGGG AGAGCGTCGT CCGATGCCCT TGAGAGCCTT CAACCCAGTC 2580
AGCTCCTTCC GGTGGGCGCG GGGCATGACT ATCGTCGCCG CACTTATGAC TGTCTTCTTT 2640
ATCATGCAAC TCGTAGGACA GGTGCCGGCA GCGCTCTGGG TCATTTTCGG CGAGGACCGC 2700
TTTCGCTGGA GCGCGACGAT GATCGGCCTG TCGCTTGCGG TATTCGGAAT CTTGCACGCC 2760
CTCGCTCAAG CCTTCGTCAC TGGTCCCGCC ACCAAACGTT TCGGCGAGAA GCAGGCCATT 2820
ATCGCCGGCA TGGCGGCCGA CGCGCTGGGC TACGTCTTGC TGGCGTTCGC GACGCGAGGC 2880
,. , . - . ~ - - . . - . . . . .
'''' ' '' ~ ' ,, ' ' ' ~' ' '`;~ ' ' ' ' '
~ W ~ 91/17170 PCT/US91/02954
2~393 74 -
TGGATGGCCT TCCCCATTAT GATTCTTCTC GCTTCCGGCG GCATCGGGAT GCCCGCGTTG 2940
CAGGCCATGC TGTCCAGGCA GGTAGATGAC GACCATCAGG GACAGCTTCA AGGATCGCTC 3000
GCGGCTCTTA CCAGCCTAAC TTCGATCACT GGACCGCTGA TCGTCACGGC GATTTATGCC 3060
GCCTCGGCGA GCACATGGAA CGGGTTGGCA TGGATTGTAG GCGCCGCCCT ATACCTTGTC 3120
TGCCTCCCCG CGTTGCGTCG CGGTGCATGG AGCCGGGCCA CCTCGACCTG AATGGAAGCC 3180
GGCGGCACCT CGCTAACGGA TTCACCACTC CAAGAATTGG AGCCAATCAA TTCTTGCGGA 3240
GAACTGTGAA TGCGCAAACC AACCCTTGGC AGAACATATC CATCGCGTCC GCCATCTCCA 3300
GCAGCCGCAC GCGGCGCATC TCGGGGGATG ATCAGCTGCC TCGCGCGTTT CGGTGATGAC 3360
GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA CAGCTTGTCT GTAAGCGGAT 3420
GCCGGGAGCA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG TTGGCGGGTG TCGGGGCGCA 3480
GCCATGACCC AGTCACGTAG CGATAGCGGA GTGTATACTG GCTTAACTAT GCGGCATCAG 3540
AGCAGATTGT ACTGAGAGTG CACCATATGC GGTGTGAAAT ACCGCACAGA TGCGTAAGGA 3600
GAAAATACCG CATCAGGCGC TCTTCCGCTT CCTCGCTCAC TGACTCGCTG CGCTCGGTCG 3660
` TTCGGCTGCG GCGAGCGGTA TCAGCTCACT CAAAGGCGGT AATACGGTTA TCCACAGAAT 3720
CAGGGGATAA CGCAGGAAAG AACATGTGAG CAAAAGGCCA GCAAAAGGCC AGGAACCGTA 3780
AAAAGGCCGC GTTGCTGGCG TTTTTCCATA GGCTCCGCCC CCCTGACGAG CATCACAAAA 3840
ATCGACGCTC AAGTCAGAGG TGGCGAAACC CGACAGGACT ATAAAGATAC CAGGCGTTTC 3900
CCCCTGGAAG CTCCCTCGTG CGCTCTCCTG TTCCGACCCT GCCGCTTACC GGATACCTGT 3960
CCGCCTTTCT CCCTTCGGGA AGCGTGGCGC TTTCTCAATG CTCACGCTGT AGGTATCTCA 4020
GTTCGGTGTA GGTCGTTCGC TCCAAGCTGG GCTGTGTGCA CGAACCCCCC GTTCAGCCCG 4080
ACCGCTGCGC CTTATCCGGT AACTATCGTC TTGAGTCCAA CCCGGTAAGA CACGACTTAT 4140
. CGCCACTGGC AGCAGCCACT GGTAACAGGA TTAGCAGAGC GAGGTATGTA GGCGGTGCTA 4200
. CAGAGTTCTT GAAGTGGTGG CCTAACTACG GCTACACTAG AAGGACAGTA TTTGGTATCT 4260
GCGCTCTGCT GAAGCCAGTT ACCTTCGGAA AAAGAGTTGG TAGCTCTTGA TCCGGCAAAC 4320
AAACCACCGC TGGTAGCGGT GGTTTTTTTG TTTGCAAGCA GCAGATTACG CGCAGAAAAA 4380
AAGGATCTCA AGAAGATCCT TTGATCTTTT CTACGGGGTC TGACGCTCAG TGGAACGAAA 4440
. ACTCACGTTA AGGGATTTTG GTCATGAGAT TATCAAAAAG GATCTTCACC TAGATCCTTT 4500
:' '' , , :.,, , . . ` ' . ` . ~ ~ : :
W O 91/~7170 PCT/uS9i/02954
2~63~93
- 75 .
TCAGATCTCC CGATCTTTAG CTGTCTTGGT TTGCCCAAAG CGCATTGCAT AATCTTTCAG 4560
GGTTATGCGT TGTTCCATAC AACCTCCTTA GTACATGCAA CCATTATCAC CGCCAGAGGT 4620
AAAATAGTCA ACACGCACGG TGTTAGATAT TTATCCCTTG CGGTGATAGA TTTAACGTAT 4680
GAGCACAAAA AAGAAACCAT TAACACAAGA GCAGCTTGAG GACGCACGTC GCCTTAAAGC 4740
AATTTATGAA AAAAAGAAAA ATGAACTTGG CTTATCCCAG GAATCTGTCG CAGACAAGAT 4800
GGGGATGGGG CAGTCAGGCG TTGGTGCTTT ATTTAATGGC ATCAATGCAT TAAATGCTTA 4860
TAACGCCGCA TTGCTTACAA AAATTCTCAA AGTTAGCGTT GAAGAATTTA GCCCTTCAAT 4920
CGCCAGAGAA ATCTACGAGA TGTATGAAGC GGTTAGTATG CAGCCGTCAC TTAGAAGTGA 4980
GTATGAGTAC CCTGTTTTTT CTCATGTTCA GGCAGGGATG TTCTCACCTA AGCTTAGAAC 5040
CTTTACCAAA GGTGATGCGG AGAGATGGGT AAGCACAACC AAAAAAGCCA GTGATTCTGC 5100
ATTCTGGCTT GAGGTTGAAG GTAATTCCAT GACCGCACCA ACAGGCTCCA AGCCAAGCTT 5160
TCCTGACGGA ATGTTAATTC TCGTTGACCC TGAGCAGGCT GTTGAGCCAG GTGATTTCTG 5220
CATAGCCAGA CTTGGGGGTG ATGAGTTTAC CTTCAAGAAA CTAATTAGGG ATAGCGGTCA 5280
GGTGTTTTTA CAACCACTAA ACCCACAGTA CCCAATGATC CCATGCAATG AGAGTTGTTC 5340
CGTTGTGGGG AAAGTTATCG CTAGTCAGTG GCCTGAAGAG ACGTTTGGCT GATCGGCAAG 5400
GTGTTCTGGT CGGCGCATAG CTGATAACAA TTGAGCAAGA ATCTTCATCG GGGCTGCAGC 5460
CCACGATGCG TCCGGCGTAG AGGATCTCTC ACCTACCAAA CAATGCCCCC CTGCAAAAAA 5520
TAAATTCATA TAAAAAACAT ACAGATAACC ATCTGCGGTG ATAAATTATC TCTGGCGGTG 5580
TTGACATAAA TACCACTGGC GGTGATACTG AGCACATCAG CAGGACGCAC TGACCACCAT 5640
GAAGGTGACG CTCTTAAAAT TAAGCCCTGA AGAAGGGCAG CATTCAAAGC AGAAGGCTTT 5700
GGGGTGTGTG ATACGAAACG AAGCATT 5727
., '-, .
: ~ : ~ . . ,
', , :", .,, , ~ ' ' '
WO 91/17171) PCI/IJS91/02954
2~3~93 - 76 -
(2) INFORMATION FOR SEQ ID NO:14:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 23 amino acids
(B) TYPE: amino acid
(C) STRANDEDNESS: ~ingle
(D) TOPOLOGY: linear
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:
Gly Tyr Gly Lys Hi~ Val Val Pro A~n Glu Val Val Val Gln Arg Leu
l 5 10 15
Phe Gln Val Ly~ Gly Arg Arg
': , , . . , , , - : - . ~: ~ :