Note: Descriptions are shown in the official language in which they were submitted.
CA 02406381 2005-07-08
MAIZE CELLULOSE SYNTHASES AND USES THEREOF
TECHNICAL FIELD
The present invention relates generally to plant molecular biology. More
specifically, it relates to nucleic acids and methods for modulating their
expression in plants.
BACKGROUND OF THE INVENTION
Polysaccharides constitute the bulk of the plant cell walls and have been
traditionally classified into three categories: cellulose, hemicellulose, and
pectin. Fry, S. C.
(1988), The growing plant cell wall: Chemical and metabolic analysis, New
York: Longman
Scientific & Technical. Whereas cellulose is made at the plasma membrane and
directly laid
down into the cell wall, hemicellulosic and pectic polymers are first made in
the Golgi apparatus
and then exported to the cell wall by exocytosis. Ray, P. M., et al., (1976),
Ber. Deutsch. Bot.
Ges. Bd. 89, 121-146. The variety of chemical linkages in the pectic and
hemicellulosic
polysaccharides indicates that there must be tens of polysaccharide synthases
in the Golgi
apparatus. Darvill et al., (1980). The primary cell walls of flowering plants.
In The Plant Cell (N.
E. Tolbert, ed.), Vol. 1 in Series: The biochemistry of plants: A
comprehensive treatise, eds. P.K.
Stumpf and E.E. Conn (New York: Academic Press), pp. 91-162.
Even though sugar and polysaccharide compositions of the plant cell
walls have been well characterized, very limited progress has been made toward
identification of
the enzymes involved in polysaccharides formation, the reason being their
labile nature and
recalcitrance to solubilization by available detergents. Sporadic claims for
the identification of
cellulose synthase from plant sources have been made over the years.
Callaghan, T., and
Benziman, M. (1984), Nature 311, 165-167; Okuda, et al., (1993), Plant
Physiol. 101, 1131-1142.
However, these claims have been met with skepticism. Callaghan, T., and
Benziman, M. (1985),
Nature 314, 383-384; Delmer, et al., (1993), Plant Physiol. 103, 307-308. It
was only recently
that a putative gene for plant cellulose synthase (CeIA) was cloned from the
developing cotton
fibers based on homology to the bacterial gene and a substrate binding and/or
catalysis motif was
proposed, comprising conserved Asp residues and a conserved amino acid
sequence (QXXRW),
corresponding, for example, to amino acids 300, 367, 462 and 498-502 of the A.
tumefaciens
CeIA polypeptide, respectively. Pear, et al., Proc. Natl.
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
Aced. Sci. (USA) 93, 12637-12642; Saxena, et al., (1990), Plafat Moleculay-
Biology 15,
673-684; see also, WO 9818949.
Cellulose, by virtue of its ability to form semicrystalline microfibrils, has
a
very high tensile strength which approaches that of some metals. Niklas, K. J.
(1992), Plant
Biomechanics: An engineering approach to plant form and function, The
University of
Chicago Press, p. 607. Bending strength of the culin of normal and brittle-
culm mutants of
barley has been found to be directly correlated with the concentration of
cellulose in the cell
wall. Kokubo, et al., (I989), Plant Physiology 91, 876-882; Kokubo, et al.,
(1991) Playat
Physiology 97, 509-514.
As stalk composition contributes to numerous quality factors important in
maize breeding, what is needed in the art are products and methods for
manipulating
cellulose concentration in the cell wall and thereby altering plant stalk
quality to provide,
for example, increased standability or improved silage. The present invention
provides
these and other advantages.
SUMMARY OF THE INVENTION
Generally, it is the object of the present invention to provide nucleic acids
and
proteins relating to cellulose syntheses. It is an object of the present
invention to provide
transgenic plants comprising the nucleic acids of the present invention, and
methods for
modulating, in a transgenic plant, expression of the nucleic acids of the
present invention.
Therefore, in one aspect the present invention relates to an isolated nucleic
acid
comprising a member selected from the group consisting of (a) a polynucleotide
having a
specified sequence identity to a polynucleotide encoding a polypeptide of the
present
invention; (b) a polynucleotide which is complementary to the polynucleotide
of (a); and,
(c) a polyrlucleotide comprising a specified number of contiguous nucleotides
from a
polynucleotide of (a) or (b). The isolated nucleic acid can be DNA.
In other aspects the present invention relates to: 1) recombinant expression
cassettes,
comprising a nucleic acid of the present invention operably linked to a
promoter,
2) a host cell into which has been introduced the recombinant expression
cassette, and 3) a
transgenic plant comprising the recombinant expression cassette. The host cell
and plant
are optionally from maize, wheat, rice, or soybean.
2
CA 02406381 2005-07-08
PCT/USO1/11957
Another aspect of the invention is to provide an isolated nucleic acid
comprising a polynucleotide encoding a functional cellulose synthase
polypeptide,
wherein the polypeptide comprises a cellulose synthase substrate binding
and/or
catalysis motif and, having at least 90% sequence identity to the full length
sequence
of the polynucleotide of SEQ 117 NO: l, as determined by GAP having a creation
penalty of 50 and an extension penalty of 3.
Another aspect of the invention is to provide an isolated nucleic acid
comprising a polynucleotide selected from the group consisting of SEQ m NO: 1
and
SEQ m NO: S.
Another aspect of the invention is to provide an isolated nucleic acid
comprising a polynucleotide encoding a polypeptide selected from the group
consisting
of SEQ m NO: 2 and SEQ m NO: 6.
Another aspect of the invention is to provide an isolated nucleic acid
comprising a polynucleotide which is complementary to a full length sequence
of the
polynucleotide of SEQ m NO: 1 or SEQ m NO: 5.
Another aspect of the invention is to provide a recombinant expression
cassette,
comprising the polynucleotide of any of the nucleic acids described above,
operably
linked, in sense or anti-sense orientation, to a promoter.
Another aspect of the invention is to provide a host cell transformed with the
recombinant expression cassette described above. The host cell may be a plant
cell.
The plant cell may be from a monocot or a dicot, and may be selected from the
group
consisting of: maize, soybean, sunflower, sorghum, canola, wheat, alfalfa,
cotton, rice,
barley, millet, peanut, and cocoa. The plant cell may be from a seed.
Another aspect of the invention is to provide a method of modulating the level
of cellulose synthase in a plant cell, comprising, (a) introducing into a
plant cell the
recombinant expression cassette described above, (b) culturing the plant cell
under
plant cell growing conditions, and (c) inducing expression of said
polynucleotide for a
time sufficient to modulate the level of cellulose synthase in said plant
cell. The plant
cell may be from maize, wheat, rice, or soybean.
2a
5526469.1
31539-2144
CA 02406381 2005-07-08
PCT/USO1/11957
Another aspect of the invention is to provide a method of modulating the level
of cellulose synthase in a plant, comprising: (a) introducing into a plant
cell the
recombinant expression cassette described above, (b) culturing the plant cell
under
plant cell growing conditions, (c) regenerating a plant from said plant cell,
and (d)
inducing expression of said polynucleotide for a time sufficient to modulate
the level
of cellulose synthase in said plant. The plant may be maize, wheat, rice, or
soybean.
Another aspect of the invention is to provide an isolated protein comprising a
polypeptide with cellulose synthase activity selected from the group
consisting of SEQ
m NOS: 2 and 6.
Another aspect of the invention is to provide an isolated protein comprising a
polypeptide with cellulose synthase activity comprising a cellulose synthase
substrate
binding and/or catalysis motif, and having at least 90% sequence identity to,
and
having at least one epitope in common with the full length sequence of the
polypeptide
of SEQ ID NO: 2, wherein said sequence identity is determined by GAP having a
creation penalty of 8 and an extension penalty of 2.
Another aspect of the invention is to provide an isolated protein comprising
at
least one polypeptide with cellulose synthase activity encoded by a full
length
sequence of a polynucleotide described above.
Another aspect of the invention is to provide a method of down-regulating
expression of a cellulose synthase gene in a maize plant, comprising: (a)
providing a
population of maize plants mutagenized with a Mu transposable element, (b)
screening
the genomic DNA of each plant of the population of (a) for the presence of Mu
insertions within a polynucleotide described above, (c) identifying the plants
of (b) that
contain one or more Mu insertions within a polynucleotide described above, and
(d)
selecting those plants of (c) that show modified cellulose synthase gene
expression.
Another aspect of the invention is to provide a use of the polynucleotides
described above to modulate the level of cellulose synthase in a plant. The
level of
cellulose synthase my be increased.
2b
5526469.1
31539-2144
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
DETAILED DESCRIPTION OF THE INVENTION
Overview
A. Nucleic Acids and Protein of the Present Invention
Unless otherwise stated, the polynucleotide and polypeptide sequences
identified in
Table 1 represent polynucleotides and polypeptides of the present invention.
Table 1 cross-
references these polynucleotide and polypeptides to their gene name and
internal database
identification number. A nucleic acid of the present invention comprises a
polynucleotide
of the present invention. A protein of the present invention comprises a
polypeptide of the
present invention.
Table 2 further provides a calculation of the percent identity/similarity of
the
referenced polynucleotide/polypeptide sequences to homologues identified using
methods-
such as the one disclosed in Example 4.
TABLE 1
Gene Name Database ID NO: Polynucleotide Polypeptide SEQ
SEQ ID
ID NO: NO:
Cellulose synthaseCdpgs45 (cesA-3)1 2
Cellulose synthaseCqrael9 (cesA-9)5 6
B. Exemplary Utility of the Present Invention
The present invention provides utility in such exemplary applications as
improvement of stalk quality for improved stand or silage. Further, the
present invention
provides for an increased concentration of cellulose in the pericarp,
hardening the kernel
and thus improving its handling ability.
Definitions
Units, pref xes, and symbols may be denoted in their SI accepted form. Unless
otherwise indicated, nucleic acids are written left to right in 5' to 3'
orientation; amino acid
sequences are written left to right in amino to carboxy orientation,
respectively. Numeric
ranges recited within the specification are inclusive of the numbers defining
the range and
include each integer within the defined range. Amino acids may be referred to
herein by
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
either their commonly known three letter symbols or by the one-letter symbols
recommended by the IUPAC-IUBMB Nomenclature Commission. Nucleotides, likewise,
may be referred to by their commonly accepted single-letter codes. Unless
otherwise
provided for, software, electrical, and electronics terms as used herein are
as defined in The
New IEEE Standard Dictionary of Electrical and Electronics Terms (St''
edition, 1993). The
terms defined below are more fully defined by reference to the specification
as a whole.
Section headings provided throughout the specification are not limitations to
the various
objects and embodiments of the present invention.
By "amplified" is meant the construction of multiple copies of a nucleic acid
sequence or multiple copies complementary to the nucleic acid sequence using
at least one
of the nucleic acid sequences as a template. Amplification systems include the
polymerase
chain reaction (PCR) system, ligase chain reaction (LCR) system, nucleic acid
sequence
based amplif cation (NASBA, Cangene, Mississauga, Ontario), Q-Beta Replicase
systems,
transcription-based amplification system (TAS), and strand displacement
amplification
(SDA). See, e.g., Diagnostic MoleculaY Microbiology: Principles and
Applications, D. H.
Persing et al., Ed., American Society for Microbiology, Washington, D.C.
(1993). The
product of amplification is termed an amplicon.
As used herein, "antisense orientation" includes reference to a duplex
polynucleotide sequence that is operably linked to a promoter in an
orientation where the
antisense strand is transcribed. The antisense strand is sufficiently
complementary to an
endogenous transcription product such that translation of the endogenous
transcription
product is often inhibited.
By "encoding" or "encoded", with respect to a specified nucleic acid, is meant
comprising the information for translation into the specified protein. A
nucleic acid
encoding a protein may comprise non-translated sequences (e.g., introns)
witlun translated
regions of the nucleic acid, or may lack such intervening non-translated
sequences (e.g., as
in cDNA). The information by which a protein is encoded is specified by the
use of codons.
Typically, the amino acid sequence is encoded by the nucleic acid using the
"universal"
genetic code. However, variants of the universal code, such as are present in
some plant,
animal, and fungal mitochondria, the bacterium Mycoplasfna cap~icoluna, or the
ciliate
Macf-onucleus, may be used when the nucleic acid is expressed therein.
When the nucleic acid is prepared or altered synthetically, advantage can be
taken of
known codon preferences of the intended host where the nucleic acid is to be
expressed.
4
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
For example, although nucleic acid sequences of the present invention may be
expressed in
both monocotyledonous and dicotyledonous plant species, sequences can be
modified to
account for the specific codon preferences and GC content preferences of
monocotyledons
or dicotyledons as these preferences have been shown to differ (Murray et al.
Nucl. Acids
Res. 17: 477-498 (1989)). Thus, the maize preferred codon for a particular
amino acid may
be derived from known gene sequences from maize. Maize codon usage for 28
genes from
maize plants is listed in Table 4 of Murray et al., supra. .
As used herein "full-length sequence" in reference to a specified
polynucleotide or
its encoded protein means having the entire amino acid sequence of a native
(non-
synthetic), endogenous, biologically (e.g., structurally or catalytically)
active form of the
specified protein. Methods to determine whether a sequence is full-length are
well known in
the art, including such exemplary techniques as northern or western blots,
primer extension,
S 1 protection, and ribonuclease protection. See, e.g., Plafat Molecular
Biology: A
Laboratory Manual, Clark, Ed., Springer-Verlag, Berlin (1997). Comparison to
known
full-length homologous (orthologous and/or paralogous) sequences can also be
used to
identify full-length equences of the present invention. Additionally,
consensus sequences
typically present at the 5' and 3' untranslated regions of mRNA aid in the
identification of a
polynucleotide as full-length. For example, the consensus sequence A~INNNAUGG,
where
the underlined codon represents the N-terminal methionine, aids~in determining
whether the
polynucleotide has a complete S' end. Consensus sequences at the 3' end, such
as
polyadenylation sequences, aid in determining whether the polynucleotide has a
complete 3'
end.
As used herein, "heterologous" in reference to a nucleic acid is a nucleic
acid that
originates from a foreign species, or, if from the same species, is
substantially modified
from its native form in composition and/or genomic locus by human
intervention. For
example, a promoter operably linked to a heterologous structural gene is from
a species
different from that from which the structural gene was derived, or, if from
the same
species, one or both are substantially modified from their original form. A
heterologous
protein may originate from a foreign species or, if from the same species, is
substantially
modified from its original form by human intervention.
By "host cell" is meant a cell which contains a vector and supports the
replication
and/or expression of the vector. Host cells may be prokaryotic cells such as
E. coli, or
eukaryotic cells such as yeast, insect, amphibian, or mammalian cells.
Preferably, host cells
5
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
are monocotyledonous or dicotyledonous plant cells. A particularly preferred
monocotyledonous host cell is a maize host cell.
The term "introduced" includes reference to the incorporation of a nucleic
acid into
a eukaryotic or prokaryotic cell where the nucleic acid may be incorporated
into the genome
of the cell (e.g., chromosome, plasmid, plastid or mitochondria) DNA),
converted into an
autonomous replicon, or transiently expressed (e.g., transfected mRNA). The
term includes
such nucleic acid introduction means as "transfection", "transformation" and
"transduction".
The teen "isolated" refers to material, such as a nucleic acid or a protein,
which is:
(1) substantially or essentially free from components Which normally accompany
or interact
with it as found in its natural environment. The isolated material optionally
comprises
material not found with the material in its natural environment; or (2) if the
material is in its
natural environment, the, material has been synthetically altered or
synthetically produced
by deliberate human intervention and/or placed at a different location within
the cell. The
synthetic alteration or creation of the material can be performed on the
material within or
apart from its natural state. For example, a naturally-occurring nucleic acid
becomes an
isolated nucleic acid if it is altered or produced by non-natural, synthetic
methods, or if it is
transcribed from DNA which has been altered or produced by non-natural,
synthetic
methods. The isolated nucleic acid may also be produced by the synthetic re-
arrangement
("shuffling") of a part or parts of one or more allelic forms of the gene of
interest.
Likewise, a naturally-occurring nucleic acid (e.g., a promoter) becomes
isolated if it is
introduced to a different locus of the genome. Nucleic acids which are
"isolated," as
defined herein, are also referred to as "heterologous" nucleic acids. See,
e.g., Compounds
and Methods for Site Directed Mutagenesis in Eukaryotic Cells, I~miec, U.S.
Patent No.
5,565,350; In Vivo Homologous Sequence Targeting in Eukaryotic Cells, Zarling
et al.,
WO 93/22443 (PCT/US93103868).
As used herein, "nucleic acid" includes reference to a deoxyribonucleotide or
ribonucleotide polymer, or chimeras thereof, in either single- or double-
stranded form, and
unless otherwise limited, encompasses known analogues having the essential
nature of
natural nucleotides in that they hybridize to single-stranded nucleic acids in
a manner
similar to naturally occurring nucleotides (e.g., peptide nucleic acids).
By "nucleic acid library" is meant a collection of isolated DNA or RNA
molecules
which comprise and substantially represent the entire transcribed fraction of
a genome of a
specified organism, tissue, or of a cell type from that organism. Construction
of exemplary
6
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
nucleic acid libraries, such as genomic and cDNA libraries, is taught in
standard molecular-
biology references such as Berger and Kimmel, Guide to Molecular Glohing
Techniques,
Metlzods ira Ehzymology, Vol. 152, Academic Press, Inc., San Diego, CA
(Berger);
Sambrook et al., Molecular Clofaiug - A Labor°atory Manual, 2nd ed.,
Vol. 1-3 (1989); and
Currefit Protocols in Molecular Biology, F.M. Ausubel et al., Eds., Current
Protocols, a
joint venture between Greene Publishing Associates, Inc. and John Wiley &
Sons, Inc.
( 1994).
As used herein "operably linked" includes reference to a functional linkage
between
a promoter and a second sequence, wherein the promoter sequence initiates and
mediates
transcription of the DNA sequence corresponding to the second sequence.
Generally,
operably Linked means that the nucleic acid sequences being linked are
contiguous and,
where necessary to join two protein coding regions, contiguous and in the same
reading
frame.
As used herein, the term "plant" includes reference to whole plants, plant
parts or
organs (e.g., leaves, stems, roots, etc.), plant cells, seeds and progeny of
same. Plant cell, as
used herein, further includes, without limitation, cells obtained from or
found in: seeds,
suspension cultures, embryos, meristematic regions, callus tissue, leaves,
roots, shoots,
gametophytes, sporophytes, pollen, and microspores. Plant cells can also be
understood to
include modified cells, such as protoplasts, obtained from the aforementioned
tissues. The
class of plants which can be used in the methods of the invention is generally
as broad as
the class of higher plants amenable to transformation techniques, including
both
monocotyledonous and dicotyledonous plants. A particularly preferred plant is
Zea mays.
As used herein, "polynucleotide" includes reference to a
deoxyribopolynucleotide,
ribopolynucleotide, or chimeras or analogs thereof that have the essential
nature of a natural
deoxy- or ribo- nucleotide in that they hybridize, under stringent
hybridization conditions,
to substantially the same nucleotide sequence as naturally occurring
nucleotides ancUor
allow translation into the same amino acids) as the naturally occurring
nucleotide(s). A
polynucleotide can be full-length or a subsequence of a native or heterologous
structural or
regulatory gene. Unless otherwise indicated, the term includes reference to
the specified
sequence as well as the complementary sequence thereof. Thus, DNAs or RNAs
with
backbones modified for stability or for other reasons are "polynucleotides" as
that term is
intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as
inosine, or
modified bases, such as tritylated bases, to name just two examples, are
polynucleotides as the
7
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
term is used herein. It will be appreciated that a great variety of
modifications have been made
to DNA and RNA that serve many useful purposes known to those of skill in the
art. The term
polynucleotide as it is employed herein embraces such chemically,
enzymatically or
metabolically modified forms of polynucleotides, as well as the chemical forms
of DNA and
RNA characteristic of viruses and cells, including among other things, simple
and complex
cells.
The terms "polypeptide", "peptide" and "protein" are used interchangeably
herein to
refer to a polymer of amino acid residues. The terms apply to amino acid
polymers in
which one or more amino acid residue is an artificial chemical analogue of a
corresponding
naturally occurring amino acid, as well as to naturally occurring amino acid
polymers. The
essential nature of such analogues of naturally occurring amino acids is
that,~when
incorporated into a protein, that protein is specifically reactive to
antibodies elicited to the
same protein but consisting entirely of naturally occurring amino acids. The
terms
"polypeptide", "peptide" and "protein" are also inclusive of modifications
including, but not
limited to, glycosylation, Lipid attachment, sulfation, gamma-carboxylation of
glutamic acid
residues, hydroxylation and ADP-ribosylation. Further, this invention
contemplates the use
of both the methionine-containing and the methionine-less amino terminal
variants of the
protein of the invention.
As used herein "promoter" includes reference to a region of DNA upstream from
the
start of transcription and involved in recognition and binding of RNA
polymerase and other
proteins to initiate transcription. A "plant promoter" is a promoter capable
of initiating
transcription in plant cells whether or not its origin is a plant cell.
Exemplary plant
promoters include, but are not limited to, those that are obtained from
plants, plant viruses,
and bacteria which comprise genes expressed in plant cells such Ag~~obacteYium
or
Rl2izobium. Examples of promoters under developmental control include
promoters that
preferentially initiate transcription in certain tissues, such as leaves,
roots, or seeds. Such
promoters are referred to as "tissue preferred". Promoters which initiate
transcription only
in certain tissue are referred to as "tissue specific". A "cell type" specific
promoter
primarily drives expression in certain cell types in one or more organs, for
example,
vascular cells in roots or leaves. An "inducible" or "repressible" promoter is
a promoter
which is under environmental control. Examples of environmental conditions
that may
effect transcription by inducible promoters include anaerobic conditions or
the presence of
light. Tissue specific, tissue preferred, cell type specific, and inducible
promoters constitute
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
the class of "non-constitutive" promoters. A "constitutive" promoter is a
promoter which is
active under most enviromnental conditions.
As used herein "recombinant" includes reference to a cell or vector, that has
been
modified by the introduction of a heterologous nucleic acid or that the cell
is derived from a
cell so modified. Thus, for example, recombinant cells express genes that are
not found in
identical form within the native (non-recombinant) form of the cell or express
native genes
that are otherwise abnormally expressed, under-expressed or not expressed at
all as a result
of human intervention. The term "recombinant" as used herein does not
encompass the
alteration of the cell or vector by naturally occurring events (e.g.,
spontaneous mutation,
natural transformation/transduction/transposition) such as those occurnng
without human
intervention.
As used herein, a "recombinant expression cassette" is a nucleic acid
construct,
generated recombinantly or synthetically, with a series of specified nucleic
acid elements
which permit transcription of a particular nucleic acid in a host cell. The
recombinant
expression cassette can be incorporated into a plasmid, chromosome,
mitochondria) DNA,
plastid DNA, virus, or nucleic acid fragment. Typically, the recombinant
expression
cassette portion of an expression vector includes, among other sequences, a
nucleic acid to
be transcribed, and a promoter.
The term "residue" or "amino acid residue" or "amino acid" are used
interchangeably herein to refer to an amino acid that is incorporated into a
protein,
polypeptide, or peptide (collectively "protein"). The amino acid may be a
naturally
occurring amino acid and, unless otherwise limited, may encompass non-natural
analogs of
natural amino acids that can function in a similar manner as naturally
occurring amino
acids.
The term "selectively hybridizes" includes reference to hybridization, under
stringent hybridization conditions, of a nucleic acid sequence to a specified
nucleic acid
target sequence to a detestably greater degree (e.g., at least 2-fold over
background) than its
hybridization to non-target nucleic acid sequences and to the substantial
exclusion of non-
target nucleic acids. Selectively hybridizing sequences typically have about
at least 80%
sequence identity, preferably 90% sequence identity, and most preferably 100%
sequence
identity (i.e., complementary) with each other.
The term "stringent conditions" or "stringent hybridization conditions"
includes
reference to conditions under which a probe will selectively hybridize to its
target sequence,
9
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
to a detectably greater degree than to other sequences (e.g., at least 2-fold
over background).
Stringent conditions are sequence-dependent and will .be different in
different
circumstances. By controlling the stringency of the hybridization andlor
washing
conditions, target sequences can be identified which are 100% complementary to
the probe
S (homologous probing). Alternatively, stringency conditions can be adjusted
to allow some
mismatching in sequences so that lower degrees of similarity are detected
(heterologous
probing). Generally, a probe is less than about 1000 nucleotides in length,
optionally less
than 500 nucleotides in length.
Typically, stringent conditions will be those in which the salt concentration
is less
than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration
(or other salts)
at pH 7.0 to 8.3 and the temperature is at least about 30°C for short
probes (e.g., 10 to 50
nucleotides) and at least about 60°C for long probes (e.g., greater
than 50 nucleotides).
Stringent conditions may also be achieved with the addition of destabilizing
agents such as
formamide. Exemplary low stringency conditions include hybridization with a
buffer
solution of 30 to 35% formamide, 1 M NaCI, 1% SDS (sodium dodecyl sulphate) at
37°C,
and a wash in 1X to 2X SSC (20X SSC = 3.0 M NaCI/0.3 M trisodium citrate) at
50 to
55°C. Exemplary moderate stringency conditions include hybridization in
40 to 45%
formamide, 1 M NaCI, 1% SDS at 37°C, and a wash in O.SX to 1X SSC at 55
to 60°C.
Exemplary high stringency conditions include hybridization in 50% formamide, 1
M NaCl,
1% SDS at 37°C, and a wash,in O.1X SSC at,60 to 65°C.
Specificity is typically the function of post-hybridization washes, the
critical factors
being the ionic strength and temperature of the final wash solution. For DNA-
DNA
hybrids, the Tm can be approximated from the equation of Meinkoth and Wahl,
Anal.
Bioche~rt., 138:267-284 (1984): Tm = 81.5 °C + 16.6 (log M) + 0.41
(%GC) - 0.61 (% form)
- 500/L; where M is the molarity of monovalent cations, %GC is the percentage
of
guanosine and cytosine nucleotides in the DNA, % form is the percentage of
formamide in
the hybridization solution, and L is the length of the hybrid in base pairs.
The Tm is the
temperature (under defined ionic strength and pH) at which 50% of a
complementary target
sequence hybridizes to a perfectly matched probe. Tm is reduced by about
1°C for each 1%
of mismatching; thus, Tm, hybridization and/or wash conditions can be adjusted
to hybridize
to sequences of the desired identity. For example, if sequences with >90%
identity are
sought, the Tm can be decreased 10°C. Generally, stringent conditions
are selected to be
about 5°C lower than the thermal melting point (Tm) for the specific
sequence and its
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
complement at a defined ionic strength and pH. However, severely stringent
conditions can
utilize a hybridization and/or wash at 1, 2, 3, or 4 °C lower than the
thermal melting point
(Tm); moderately stringent conditions can utilize a hybridization and/or wash
at 6, 7, 8, 9, or
°C lower than the thermal melting point (Tm); low stringency conditions
can utilize a
5 hybridization and/or wash at 11, 12, 13, 14, 15, or 20 °C lower than
the thermal melting
point (Tm). Using the equation, hybridization and wash compositions, and
desired Tn,, those
of ordinary skill will understand that variations in the stringency of
hybridization and/or
wash solutions are inherently described. If the desired degree of mismatching
results in a
Tm of less than 45 °C (aqueous solution) or 32 °C (formamide
solution) it is preferred to
10 increase the SSC concentration so that a higher temperature can be used.
Hybridization
and/or wash conditions can be applied for at least 10, 30, 60, 90, 120, or 240
minutes. An
extensive guide to the hybridization of nucleic acids is found in Tijssen,
Laboratory
Techniques in Biochemistry and Molecular Biology--Hybridization with Nucleic
Acid
Probes, Part I, Chapter 2 "Overview of principles of hybridization and the
strategy of
nucleic acid probe assays", Elsevier, New York (1993); and Current Protocols
ira Molecular
Biology, Chapter 2, Ausubel, et ccl., Eds., Greene Publishing and' Wiley-
Interscience, New
York (1995).
As used herein, "transgenic plant" includes reference to a plant which
comprises
within its genome a heterologous polynucleotide. Generally, the heterologous
polynucleotide is stably integrated within the genome such that the
polynucleotide is passed
on to successive generations. The heterologous polynucleotide may be
integrated into the
genome alone or as part of a recombinant expression cassette. "Transgenic" is
used herein to
include any cell, cell line, callus, tissue, plant part or plant, the genotype
of which has been
altered by the presence of heterologous nucleic acid including those
transgenics initially so
altered as well as those created by sexual crosses or asexual propagation from
the initial
transgenic. The term "transgenic" as used herein does not encompass the
alteration of the
genome (chromosomal or extra-chromosomal) by conventional plant breeding
methods or
by naturally occurnng events such as random cross-fertilization, non-
recombinant viral
infection, non-recombinant bacterial transformation, non-recombinant
transposition, or
spontaneous mutation.
As used herein, "vector" includes reference to a nucleic acid used in
introduction of
a polynucleotide of the present invention into a host cell. Vectors are often
replicons.
Expression vectors permit transcription of a nucleic acid inserted therein.
11
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
The following terms are used to describe the sequence relationships between a
polynucleotide/polypeptide of the present invention with a reference
polynucleotide/polypeptide: (a) "reference sequence", (b) "comparison window",
(c)
"sequence identity", and (d) "percentage of sequence identity".
(a) As used herein, "reference sequence" is a defined sequence used as a basis
for
sequence comparison with a polynucleotide/polypeptide of the present
invention. A
reference sequence may be a subset or the entirety of a specified sequence;
for example, as
a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene
sequence.
(b) As used herein, "comparison window" includes reference to a contiguous and
specified segment of a polynucleotide/polypeptide sequence, wherein the
polynucleotide/polypeptide sequence may be compared to a reference sequence
and wherein
the portion of the polynucleotide/polypeptide sequence in the comparison
window may
comprise additions or deletions (i.e., gaps) compared to the reference
sequence (which does
not comprise additions or deletions) for optimal alignment of the two
sequences. Generally,
the comparison window is at least 20 contiguous nucleotides/amino acids
residues in length,
and optionally can be 30, 40, 50, 100, or longer. Those of skill in the art
understand that to
avoid a high similarity to a reference sequence due to inclusion of gaps in
the
polynucleotide/polypeptide sequence, a gap penalty is typically introduced and
is subtracted
from the number of matches.
Methods of alignment of sequences for comparison are well-known in the art.
Optimal alignment of sequences for comparison may be conducted by the local
homology
algorithm of Smith and Waterman, Adv. Appl. Math. 2: 482 (1981); by the
homology
alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48: 443 (I970); by
the search
for similarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. 85: 2444
(1988); by
computerized implementations of these algorithms, including, but not limited
to: CLUSTAL
in the PC/Gene program by Intelligenetics, Mountain View, California; GAP,
BESTFIT,
BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics
Computer Group (GCG), 575 Science Dr., Madison, Wisconsin, USA; the CLUSTAL
program is well described by Higgins and Sharp, Gefae 73: 237-244 (1988);
Higgins and
Sharp, CABIOS 5: 151-153 (1989); Corpet, et al., Nucleic Acids Research 16:
10881-90
(1988); Huang, et al., Computer Applications in the Biosciences 8: 155-65
(1992), and
Pearson, et al., Methods in Molecular Biology 24: 307-331 (1994).
12
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
The BLAST family of programs which can be used for database similarity
searches
includes: BLASTN for nucleotide query sequences against nucleotide database
sequences;
BLASTX for nucleotide query sequences against protein database sequences;
BLASTP for
protein query sequences against protein database sequences; TBLASTN for
protein query
sequences against nucleotide database sequences; and TBLASTX for nucleotide
query
sequences against nucleotide database sequences. See, Current Protocols in
Molecular-
Biology, Chapter 19, Ausubel, et al., Eds., Greene Publishing and Wiley-
Interscience, New
York (1995); Altschul et al., J. Mol. Biol., 215:403-410 (1990); and, Altschul
et al., Nucleic
Acids Res. 25:3389-3402 (1997).
Software for performing BLAST analyses is publicly available, e.g., through
the
National Center for Biotechnology Information (http://www.ncbi.nlm.nih.govn.
This
algorithm involves first identifying high scoring sequence pairs (HSPs) by
identifying short
words of length W in the query sequence, which either match or satisfy some
positive-
valued threshold score T when aligned with a word of the same length in a
database
sequence. T is referred to as the neighborhood word score threshold. These
initial
neighborhood word hits act as seeds for initiating searches to find longer
HSPs containing
them. The word hits are then extended in both directions along each sequence
for as far as
the cumulative alignment score can be increased. Cumulative scores are
calculated using,
for nucleotide sequences, the parameters M (reward score for a pair of
matching residues;
always > 0) and N (penalty score for mismatching residues; always < 0). For
amino acid
sequences, a scoring matrix is used to calculate the cumulative score.
Extension of the
word hits in each direction are halted when:. the cumulative alignment score
falls off by the
quantity X from its maximum achieved value; the cumulative score goes to zero
or below,
due to the accumulation of one or more negative-scoring residue alignments; or
the end of
either sequence is reached. The BLAST algorithm parameters W, T, and X
determine the
sensitivity and speed of the alignment. The BLASTN program (for nucleotide
sequences)
uses as defaults a wordlength (W) of 11, an expectation (E) of
10, a cutoff of 100, M=5, N=-4, and a comparison of both strands. For amino
acid
sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an
expectation (E)
of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff (1989) P~oc.
Natl.
Acad. Sci. USA 89:10915).
In addition to calculating percent sequence identity, the BLAST algorithm also
performs a statistical analysis of the similarity between two sequences (see,
e.g., Marlin &
13
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
Altschul, Pr-oc. Nat'l. Aced. Sci. USA 90:5873-5877 (1993)). One measure of
similarity
provided by the BLAST algorithm is the smallest sum probability (P(N)), which
provides an
indication of the probability by which a match between two nucleotide or amino
acid
sequences would occur by chance.
BLAST searches assume that proteins can be modeled as random sequences.
However, many real proteins comprise regions of nonrandom sequences which may
be
homopolymeric tracts, short-period repeats, or regions enriched in one or more
amino acids.
Such low-.complexity regions may be aligned between unrelated proteins even
though other
regions of the protein are entirely dissimilar. A number of low-complexity
filter programs
can be employed to reduce such low-complexity alignments. For example, the SEG
(Wooten and Federhen, Comput. Chem., 17:149-163 (1993)) and XNU (Claverie and
States,
Comput. ClZem., 17:191-201 (1993)) low-complexity filters can be employed
alone or in
combination.
Unless otherwise stated, nucleotide and protein identity/similarity values
provided
herein are calculated using GAP (GCG Version 10) under default values.
GAP (Global Alignment Program) can also be used to compare a polynucleotide or
polypeptide of the present invention with a reference sequence. GAP uses the
algorithm of
Needleman and Wunsch (J. Mol. Biol. 48: 443-453, 1970) to find the alignment
of two
complete sequences that maximizes the number of matches and minimizes the
number of
gaps. GAP considers all possible alignments and gap positions and creates the
alignment
with the largest number of matched bases and the fewest gaps. It allows for
the provision of
a gap creation penalty and a gap extension penalty in units of matched bases.
GAP must
make a profit of gap creation penalty number of matches for each gap it
inserts. If a gap
extension penalty greater than zero is chosen, GAP must, in addition, make a
profit for each
gap inserted of the length of the gap times the gap extension penalty. Default
gap creation
penalty values and gap extension penalty values in Version I O of the
Wisconsin Genetics
Software Package for protein sequences are 8 and 2, respectively. For
nucleotide
sequences the default gap creation penalty is 50 while the default gap
extension penalty is 3.
The gap creation and gap extension penalties can be expressed as an integer
selected from
the group of integers consisting of from 0 to 100. Thus, for example, the gap
creation and
gap extension penalties can each independently be: 0, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 15, 20, 30,
40, 50, 60 or greater.
GAP presents one member of the family of best alignments. There may be many
I4
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
members of this family, but no other member has a better quality. GAP displays
four
figures of merit for alignments: Quality, Ratio, Identity, and Similarity. The
Quality is the
metric maximized in order to align the sequences. Ratio is the quality divided
by the
number of bases in the shorter segment. Percent Identity is the percent of the
symbols that
actually match. Percent Similarity is the percent of the symbols that are
similar. Symbols
that are across from gaps are ignored. A similarity is scored when the scoring
matrix value
for a pair of symbols is greater than or equal to 0.50, the similarity
threshold. The scoring
matrix used in Version 10 of the Wisconsin Genetics Software Package is
BLOSUM62 (see
Henikoff & Henikoff (1989) Proc. IVatl. Acad. Sci. USA 89:10915).
Multiple alignment of the sequences can be performed using the CLUSTAL method
of alignment (Higgins and Sharp (1989) CABIOS. 5:151-153) with the default
parameters
(GAP PENALTY=10, GAP LENGTH PENALTY=10). Default parameters for pairwise
alignments using the CLUSTAL method are KTUPLE 1, GAP PENALTY=3,
WINDOW=5 and DIAGONALS SAVED=5.
(c) As used herein, "sequence identity" or "identity" in the context of two
nucleic
acid or polypeptide sequences includes reference to the residues in the two
sequences which
are the same when aligned for maximum correspondence over a specified
comparison
window. When percentage of sequence identity is used in reference to proteins
it is
recognized that residue positions which are not identical often differ by
conservative amino
acid substitutions, where amino acid residues are substituted for other amino
acid residues
with similar chemical properties (e.g. charge or hydrophobicity) and therefore
do not
change the functional properties of the molecule. Where sequences differ in
conservative
substitutions, the percent sequence identity may be adjusted upwards to
correct for the
conservative nature of the substitution. Sequences which differ by such
conservative
substitutions are said to have "sequence similarity" or "similarity". Means
for making this
adjustment axe well-known to those of skill in the art. Typically this
involves scoring a
conservative substitution as a partial rather than a full mismatch, thereby
increasing the
percentage sequence identity. Thus, for example, where an identical amino acid
is given a
score of 1 and a non-conservative substitution is given a score of zero, a
conservative
substitution is given a score between zero and 1. The scoring of conservative
substitutions
is calculated, e.g., according to the algorithm of Meyers and Miller,
Coiraputer Applic. Biol.
Sci., 4: I 1-I7 (I988) e.g., as implemented in the program PC/GENE
(Tntelligenetics,
Mountain View, California, USA).
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
(d) As used herein, "percentage of sequence identity" means the value
determined by
comparing two optimally aligned sequences over a comparison window, wherein
the
portion of the polynucleotide sequence in the comparison window may comprise
additions
or deletions (i.e., gaps) as compared to the reference sequence (which does
not comprise
additions or deletions) for optimal alignment of the two sequences. The
percentage is
calculated by determining the number of positions at which the identical
nucleic acid base
or amino acid residue occurs in both sequences to yield the number of matched
positions,
dividing the number of matched positions by the total number of positions in
the window of
comparison and multiplying the. result by 100 to yield the percentage of
sequence identity.
Utilities
The present invention provides, among other things, compositions and methods
for
modulating (i.e., increasing or decreasing) the level of polynucleotides and
polypeptides of
the present invention in plants. In particular, the polynucleotides and
polypeptides of the
present invention can be expressed temporally or spatially, e.g., at
developmental stages, in
tissues, and/or in quantities, which are uncharacteristic of non-recombinantly
engineered
plants.
The present invention also provides isolated nucleic acids comprising
polynucleotides of sufficient length and complementarity to a polynucleotide
of the present
invention to use as probes or amplification primers in the detection,
quantitation, or
isolation of gene transcripts. For example, isolated nucleic acids of the
present invention
can be used as probes in detecting deficiencies in the level of mRNA in
screenings for
desired transgenic plants, for detecting mutations in the gene (e.g.,
substitutions, deletions,
or additions), for monitoring upregulation of expression or changes in enzyme
activity in
screening assays of compounds, for detection of any number of allelic variants
(polymorphisms), orthologs, or paralogs of the gene, or for site directed
mutagenesis in
eukaryotic cells (see, e.g., U.S. Patent No. 5,565,350). The isolated nucleic
acids of the
present invention can also be used fox recombinant expression of their encoded
polypeptides, or for use as immunogens in the preparation and/or screening of
antibodies.
The isolated nucleic acids of the present invention can also be employed for
use in sense or
antisense suppression of one or more genes of the present invention in a host
cell, tissue, or
plant. Attachment of chemical agents which bind, intercalate, cleave and/or
crosslink to the
isolated nucleic acids of the present invention can also be used to modulate
transcription or
16
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
translation.
The present invention also provides isolated proteins comprising a polypeptide
of
the present invention (e.g., preproenzyme, proenzyme, or enzymes). The present
invention,
also provides proteins comprising at least one epitope from a polypeptide of
the present
invention. The proteins of the present invention can be employed in assays for
enzyme
agonists or antagonists of enzyme function, or for use as immunogens or
antigens to obtain
antibodies specifcally immunoreactive with a protein of the present invention.
Such
antibodies can be used in assays for expression levels, for identifying and/or
isolating
nucleic acids of the present invention from expression libraries, for
identification of
homologous polypeptides from other species, or for purification of
polypeptides of the
present invention.
The isolated nucleic acids and polypeptides of the present invention can be
used
over a broad range of plant types, particularly monocots such as the species
of the family
Granzineae including Ho>"deum, Secale, Oryza, TYiticuzn, Sorglzuzzz (e.g., S.
bicolo>') and Zea
(e.g., Z. zzzays), and dicots such as Glycine.
The isolated nucleic acid and proteins of the present invention can also be
used in
species from the genera: Cucurbita, Rosa, Vitis, Juglans, FYagaria, Lotus,
Medicago,
Oriobzychis, Ti~ifoliurrz, T~igonella, Tligna, Citrus, Linum, Geranium,
Manihot, Daucus,
AYabidapsis, BI"assica, Raphanus, Sinapis, AtYOpa, Capsicum, Datu~a,
Hyoscyaznus,
Lycopersicon, Nicotiana, Solarium, Petunia, Digitalis, Majoz°ana;
Cialzoz~ium, Heliantlzus,
Lactuca, BPOmus, Asparagus, Antiz~rhinum, Heterocallis, Nemesis, PelaYgoniuzn,
Panieurn,
Perinisetum, Ranunculus, Seriecio, Salpiglossis, Cucunzis, B~owallia, Pisum,
Phaseolus,
Lolium, and Avena.
Nucleic Acids
The present invention provides, among other things, isolated nucleic acids of
RNA,
DNA, and analogs and/or chimeras thereof, comprising a polynucleotide of the
present
invention.
A polynucleotide of the present invention is inclusive of those in Table 1
and:
(a) an isolated polynucleotide encoding a polypeptide of the present invention
such
as those referenced in Table l, including exemplary polynucleotides of the
present
invention;
(b) an isolated polynucleotide which is the product of amplification from a
plant
17
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
nucleic acid library using primer pairs which selectively hybridize under
stringent
conditions to loci within a polynucleotide of the present invention;
(c) an isolated polynucleotide which selectively hybridizes to a
polynucleotide of (a)
or (b);
(d) an isolated polynucleotide having a specified sequence identity with
polynucleotides of (a), (b), or (c);
(e) an isolated polynucleotide encoding a protein having a specified number of
contiguous amino acids from a prototype polypeptide, wherein the protein is
specifically
recognized by antisera elicited by presentation of the protein and wherein the
protein does
not detectably immunoreact to antisera which has been fully immunosorbed with
the
protein;
(f) complementary sequences of polynucleotides of (a), (b), (c), (d), or (e);
and
(g) an isolated polynucleotide comprising at least a specific number of
contiguous
nucleotides from a polynucleotide of (a), (b), (c), (d), (e), or (f);
(h) an isolated polynucleotide from a full-length enriched cDNA library having
the
physico-chemical property of selectively hybridizing to a polynucleotide of
(a), (b), (c), (d),
(e)~ (~~ or (g)~
(i) an isolated polynucleotide made by the process of: 1) providing a full-
length
enriched nucleic acid library, 2) selectively hybridizing the polynucleotide
to a
polynucleoti.de of (a), (b), (c), (d), (e), (f), (g), or (h), thereby
isolating the polynucleotide
from the nucleic acid library.
A. Polynucleotides Encoding A Polypeptide of the Present Invention
As indicated in (a), above, the present invention provides isolated nucleic
acids
comprising a polynucleotide of the present invention, wherein the
polynucleotide encodes a
polypeptide of the present invention. Every nucleic acid sequence herein that
encodes a
polypeptide also, by reference to the genetic code, describes every possible
silent variation
of the nucleic acid. One of ordinary skill will recognize that each codon in a
nucleic acid
(except AUG, which is ordinarily the only codon for methionine; and UGG ,
which is
ordinarily the only codon for tryptophan) can be modified to yield a
functionally identical
molecule. Thus, each silent variation of a nucleic acid which encodes a
polypeptide of the
present invention is implicit in each described polypeptide sequence and is
within the scope
of the present invention. Accordingly, the present invention includes
polynucleotides of the
18
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
present invention and polynucleotides encoding a polypeptide of the present
invention.
B. Polynucleotides Amplified f -om a Plant Nucleic Acid Library
As indicated in (b), above, the present invention provides an isolated nucleic
acid
comprising a polynucleotide of the present invention, wherein the
polynucleotides are
amplified, under nucleic acid amplification conditions, from a plant nucleic
acid library.
Nucleic acid amplification conditions for each of the variety of amplification
methods are
well known to those of ordinary skill in the art. The plant nucleic acid
library can be
constructed from a monocot such as a cereal crop. Exemplary cereals include
maize,
sorghum, alfalfa, canola, wheat, or rice. The plant nucleic acid library can
also be
constructed from a dicot such as soybean. Zea nays lines B73, PHRE1, A632, BMS-
P2#10, W23, and Mo 17 are known and publicly available. Other publicly known
and
available maize lines can be obtained from the Maize Genetics Cooperation
(Urbana, IL).
Wheat lines are available from the Wheat Genetics Resource Center (Manhattan,
KS).
The nucleic acid library may be a cDNA library, a genomic library, or a
library
generally constructed from nuclear transcripts at any stage of intron
processing. cDNA
libraries can be normalized to increase the representation of relatively rare
cDNAs. In
optional embodiments, the cDNA library is constructed using an enriched full-
length cDNA
synthesis method. Examples of such methods include Oligo-Capping (Maruyama, K.
and
Sugano, S. Gene 138: 171-I74, I994), Biotinylated CAP Trapper (Carninci, et
al. Genomics
37: 327-336, 1996), and CAP Retention Procedure (Edery, E., Chu, L.L., et al.
Molecular
and Cellular Biology 15: 3363-3371, 1995). Rapidly growing tissues or rapidly
dividing
cells are preferred for use as an mRNA source for construction of a cDNA
library. Growth
stages of maize are described in "How a Corn PlantoDevelops," Special Report
No. 48, Iowa
State University of Science and Technology Cooperative Extension Service,
Ames, Iowa,
Reprinted February 1993.
A polynucleotide of this embodiment (or subsequences thereof) can be obtained,
for
example, by using amplification primers which are selectively hybridized and
primer
extended, under nucleic acid amplification conditions, to at least two sites
within a
polynucleotide of the present invention, or to two sites within the nucleic
acid which flank
and comprise a polynucleotide of the present invention, or to a site within a
polynucleotide
of the present invention and a site within the nucleic acid which comprises
it. Methods for
obtaining 5' and/or 3' ends of a vector insert are well known in the art. See,
e.g., RACE
29
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
(Rapid Amplification of Complementary Ends) as described in Frobman, M. A., in
PCR
Protocols: A Guide to Methods and Applications, M. A. Innis, D. H. Gelfand, J.
J. Sninsky,
T. J. White, Eds. (Academic Press, Inc., San Diego), pp. 28-38 (1990)); see
also, U.S. Pat.
No. 5,470,722, and Cury-ent Protocols in MoleculaY Biology, Unit 15.6,
Ausubel, et al.,
Eds., Greene Publishing and Wiley-Interscience, New York (1995); Frohman and
Martin,
Techniques 1:165 (1989).
Optionally, the primers are complementary to a subsequence of the target
nucleic
acid which they amplify but may have a sequence identity ranging from about
85% to 99%
relative to the polynucleotide sequence which they are designed to anneal to.
As those
skilled in the art will appreciate, the sites to which the primer pairs will
selectively
hybridize are chosen such that a single contiguous nucleic acid can be formed
under the
desired nucleic acid amplification conditions. The primer length in
nucleotides is selected
from the group of integers consisting of from at least 15 to 50. Thus, the
primers can be at
least 15, 18, 20, 25, 30, 40, or 50 nucleotides in length. Those of skill will
recognize that a
lengthened primer sequence can be employed to increase specificity of binding
(i.e.,
annealing) to a target sequence. A non-annealing sequence at the 5'end of a
primer (a
"tail") can be added, for example, to introduce a cloning site at the terminal
ends of the
amplicon.
The amplification products can be translated using expression systems well
known
to those of skill in the art. The resulting translation products can be
confirmed as
polypeptides of the present invention by, for example, assaying for the
appropriate catalytic
activity (e.g., specific activity and/or substrate specificity), or verifying
the presence of one
or more epitopes which are specific to a polypeptide of the present invention.
Methods for
protein synthesis from PCR derived templates are known in the art and
available
commercially. See, e.g., Amersham Life Sciences, Inc, Catalog '97, p.354.
The polynucleotides of the present invention include those amplified using the
following primer pairs:
SEQ ID NOS: 3 and 4, which yield an amplicon comprising a sequence having
substantial
identity to SEQ ID NO: 1; and
SEQ ID NOS: 7 and 8, which yield an amplicon comprising a sequence having
substantial
identity to SEQ ID NO: 5.
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
C. Polynucleotides Which Selectively Hybf°idize to a Polyraucleotide of
(A) or (B)
As indicated in (c), above, the present invention provides isolated nucleic
acids
comprising polynucleotides of the present invention, wherein the
polynucleotides
selectively hybridize, under selective hybridization conditions, to a
polynucleotide of
sections (A) or (B) as discussed above. Thus, the polynucleotides of this
embodiment can
be used for isolating, detecting, and/or quantifying nucleic acids comprising
the
polynucleotides of (A) or (B). For example, polynucleotides of the present
invention can be
used to identify, isolate, or amplify partial or full-length clones in a
deposited library. W
some embodiments, the polynucleotides are genomic or cDNA sequences isolated
or
otherwise complementary to a cDNA from a dicot or monocot nucleic acid
library.
Exemplary species of monocots and dicots include, but are not limited to:
maize, canola,
soybean, cotton, wheat, sorghum, sunflower, alfalfa, oats, sugar cane, millet,
barley, and
rice. The cDNA library comprises at least 50% to 95% full-length sequences
(for example,
at least 50%, 60%, 70%, 80%, 90%, or 95% full-length sequences). The cDNA
libraries
can be normalized to increase the representation of rare sequences. See, e.g.,
U.S. Patent
No. 5,482,845. Low stringency hybridization conditions are typically, but not
exclusively,
employed with sequences having a reduced sequence identity relative to
complementary
sequences. Moderate and high stringency conditions can optionally be employed
for
sequences of greater identity. Low stringency conditions allow selective
hybridization of
sequences having about 70% to 80% sequence identity and can be employed to
identify
orthologous or paralogous sequences.
D. Polyfaucleotides Having a Specific Sequence Identity with the
Polynucleotides of (A), (B)
03' (C)
As indicated in (d), above, the present invention provides isolated nucleic
acids
comprising polynucleotides of the present invention, wherein the
polynucleotides have a
specified identity at the nucleotide level to a polynucleotide as disclosed
above in sections
(A), (B), or (C), above. Identity can be calculated using, for example, the
BLAST,
CLUSTALW, or GAP algorithms under default conditions. The percentage of
identity to a
reference sequence is at least 50% and, rounded upwards to the nearest
integer, can be
expressed as an integer selected from the group of integers consisting of from
50 to 99.
Thus, for example, the percentage of identity to a reference sequence can be
at least 60%,
70%, 75%, 80%, 85%, 90%, or 95%.
21
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
Optionally, the polynucleotides of this embodiment will encode a polypeptide
that
will share an epitope with a polypeptide encoded by the polynucleotides of
sections (A),
(B), or (C). Thus, these polynucleotides encode a first polypeptide which
elicits production
of antisera comprising antibodies which are specifically reactive to a second
polypeptide
encoded by a polynucleotide of (A), (B), or (C). However, the first
polypeptide does not
bind to antisera raised against itself when the antisera has been fully
immunosorbed with the
first polypeptide. Hence, the polynucleotides of this embodiment can be used
to generate
antibodies for use in, for example, the screening of expression libraries for
nucleic acids
comprising polynucleotides of (A), (B), or (C), or for purification of, or in
immunoassays
for, polypeptides encoded by the polynucleotides of (A), (B), or (C). The
polynucleotides
of this embodiment comprise nucleic acid sequences which can be employed for
selective
hybridization to a polynucleotide encoding a polypeptide of the present
invention.
Screening polypeptides for specific binding to antisera can be conveniently
achieved
using peptide display libraries. This method involves the screening of large
collections of
peptides for individual members having the desired function or structure.
Antibody
screening of peptide display libraries is well known in the art. The displayed
peptide
sequences can be from 3 to 5000 or more amino acids in length, frequently from
5-100
amino acids long, and often from about 8 to 15 amino acids long. In addition
to direct
chemical synthetic methods for generating peptide libraries, several
recombinant DNA
methods have been described. One type involves the display of a peptide
sequence on the
surface of a bacteriophage or cell. Each bacteriophage or cell contains the
nucleotide
sequence encoding the particular displayed peptide sequence. Such methods are
described
in PCT patent publication Nos. 91/17271, 91/18980, 91/19818, and 93/08278.
Other
systems for generating libraries of peptides have aspects of both ira vitro
chemical synthesis
and recombinant methods. See, PCT Patent publication Nos. 92/05258, 92/14843,
and
97/20078. See also, U.S. Patent Nos. 5,658,754; and 5,643,768. Peptide display
libraries,
vectors, and screening kits are commercially available from such suppliers as
Invitrogen
(Carlsbad, CA).
E. Polynucleoticles Ezzcodizzg a Protein Having a Subsequezzce f-om a
Prototype
Polypepticle azzd Cross-Reactive to tlae Prototype Polypeptide
As indicated in (e), above, the present invention provides isolated nucleic
acids
comprising polynucleotides of the present invention, wherein the
polynucleotides encode a
22
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
protein having a subsequence of contiguous amino acids from a prototype
polypeptide of
the present invention such as are provided in (a), above. The length of
contiguous amino
acids from the prototype polypeptide is selected from the group of integers
consisting of
from at least 10 to the number of amino acids within the prototype sequence.
Thus, for
S example, the polynucleotide can encode a polypeptide having a subsequence
having at least
10, 1S, 20, 2S, 30, 3S, 40, 4S, or S0, contiguous amino acids from the
prototype polypeptide.
Further, the number of such subsequences encoded by a polynucleotide of the
instant
embodiment can be any integer selected from the group consisting of from 1 to
20, such as
2, 3, 4, or S. The subsequences can be separated by any integer of nucleotides
from 1 to the
number of nucleotides in the sequence such as at least S, 10, 1S, 2S, S0, 100,
or 200
nucleotides.
The proteins encoded by polynucleotides of this embodiment, when presented as
an
immunogen, elicit the production of polyclonal antibodies which specifically
bind to a
prototype polypeptide such as but not limited to, a polypeptide encoded by the
1 S polynucleotide of (a) or (b), above. Generally, however, a protein encoded
by a
polynucleotide of this embodiment does not bind to antisera raised against the
prototype
polypeptide when the antisera has been fully immunosorbed with the prototype
polypeptide.
Methods of making and assaying for antibody binding specificity/affinity are
well known in
the art. Exemplary immunoassay formats include ELISA, competitive
immunoassays,
radioimmunoassays, Western blots, indirect immunofluorescent assays and the
like.
W a preferred assay method, fully immunosorbed and pooled antisera which is
elicited to the prototype polypeptide can be used in a competitive binding
assay to test the
protein. The concentration of the prototype polypeptide required to inhibit
SO% of the
binding of the antisera to the prototype polypeptide is determined. If the
amount of the
protein required to inhibit binding is less than twice the amount of the
prototype protein,
then the protein is said to specifically bind to the antisera elicited to the
immunogen.
Accordingly, the proteins of the present invention embrace allelic variants,
conservatively
modified variants, and minor recombinant modifications to a prototype
polypeptide.
A polynucleotide of the present invention optionally encodes a protein having
a
molecular weight as the non-glycosylated protein within 20% of the molecular
weight of the
full-length non-glycosylated polypeptides of the present invention. Molecular
weight can
be readily determined by SDS-PAGE under reducing conditions. Optionally, the
molecular
weight is within 1S% of a full length polypeptide of the present invention,
more preferably
23
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
within 10% or 5%, and most preferably within 3%, 2%, or 1% of a full length
polypeptide
of the present invention.
Optionally, the polynucleotides of this embodiment will encode a protein
having a
specific enzymatic activity at least 50%, 60%, 80%, or 90% of a cellular
extract comprising
the native, endogenous full-length polypeptide of the present invention.
Further, the
proteins encoded by polynucleotides of this embodiment will optionally have a
substantially
similar affinity constant (Km ) and/or catalytic activity (i.e., the
microscopic rate constant,
k~at) as the native endogenous, full-length protein. Those of skill in the art
will recognize
that k~at/Km value determines the specificity for competing substrates and is
often referred to
as the specificity constant. Proteins of this embodiment can have a k~at~m
value at least
10% of a full-length polypeptide of the present invention as determined using
the
endogenous substrate of that polypeptide. Optionally, the k~at/Km value will
be at least 20%,
30%, 40%, SO%, and most preferably at least 60%, 70%, 80%, 90%, or 95% the
k~at/Km
value of the full-length polypeptide of the present invention. Determination
of k~at, Km , and
k~at/Km can be determined by any number of means well known to those of skill
in the art.
For example, the initial rates (i.e., the first 5% or less of the reaction)
can be determined
using rapid mixing and sampling techniques (e.g., continuous-flow, stopped-
flow, or rapid
quenching techniques), flash photolysis, or relaxation methods (e.g.,
temperature jumps) in
conjunction with such exemplary methods of measuring as spectrophotometry,
spectrofluorimetry, nuclear magnetic resonance, or radioactive procedures.
Kinetic values
are conveniently obtained using a Lineweaver-Burk or Eadie-Hofstee plot.
F. Polynucleotides Complemeyatafy to the Polyraucleotides of (A)-(E)
As indicated in (f), above, the present invention provides isolated nucleic
acids
comprising polynucleotides complementary to the polynucleotides of paragraphs
A-E,
above. As those of skill in the art will recognize, complementary sequences
base-pair
throughout the entirety of their length with the polynucleotides of sections
(A)-(E) (i.e.,
have 100% sequence identity over their entire length). Complementary bases
associate
through hydrogen bonding in double stranded nucleic acids. For example, the
following
base pairs are complementary: guanine and cytosine; adenine and thymine; and
adenine and
uracil.
24
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
G. Polynucleotides Which are Subsequences of the Polyraucleotides of (A)-(F)
As indicated in (g), above, the present invention provides isolated nucleic
acids
comprising polynucleotides which comprise at least 15 contiguous bases from
the
polynucleotides of sections (A) through (F) as discussed above. The length of
the
polynucleotide is given as an integer selected from the group consisting of
from at least 15
to the length of the nucleic acid sequence from which the polynucleotide is a
subsequence
of. Thus, for example, polynucleotides of the present invention are inclusive
of
polynucleotides comprising at least 15, 20, 25, 30, 40, 50, 60, 75, or 100
contiguous
nucleotides in length from the polynucleotides of (A)-(F). Optionally, the
number of such
subsequences encoded by a polynucleotide of the instant embodiment can be any
integer
selected from the group consisting of from 1 to 20, such as 2, 3, 4, or 5. The
subsequences
can be separated by any integer of nucleotides from 1 to the number of
nucleotides in the
sequence such as at least 5, 10, 15, 25, 50, 100, or 200 nucleotides.
Subsequences can be made by in vitro synthetic, ih vitf°o biosynthetic,
or ih vivo
recombinant methods. In optional embodiments, subsequences can be made by
nucleic acid
amplification. For example, nucleic acid primers will be constructed to
selectively
hybridize to a sequence (or its complement) within, or co-extensive with, the
coding region.
The subsequences of the present invention can comprise structural
characteristics of
the sequence from which it is derived. Alternatively, the subsequences can
lack certain
structural characteristics of the larger sequence from which it is derived
such as a poly (A)
tail. Optionally, a subsequence from a polynucleotide encoding a polypeptide
having at
least one epitope in common with a prototype polypeptide sequence as provided
in (a),
above, may encode an epitope in common with the prototype sequence.
Alternatively, the
subsequence may not encode an epitope in common with the prototype sequence
but can be
used to isolate the larger sequence by, for example, nucleic acid
hybridization with the
sequence from which it's derived. Subsequences can be used to modulate or
detect gene
expression by introducing into the subsequences compounds which bind,
intercalate, cleave
and/or crosslinlc to nucleic acids. Exemplary compounds include acridine,
psoralen,
phenanthroline, naphthoquinone, daunomycin or chloroethylaminoaryl conjugates.
25
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
H. Polynucleotides Fy-orn a Full-length Enriched cDNA LibraYy Having the
Playsico-
Claenaical Pf°opey-ty of Selectively Hybj°idizing to a
Polynucleotide of (A)-(G)
As indicated in (h), above, the present invention provides an isolated
polynucleotide
from a full-length enriched cDNA library having the physico-chemical property
of
selectively hybridizing to a polynucleotide of paragraphs (A), (B), (C), (D),
(E), (F), or (G)
as discussed above. Methods of constructing full-length enriched cDNA
libraries are
known in the art and discussed briefly below. The cDNA library comprises at
least 50% to
95% full-length sequences (for example, at least 50%, 60%, 70%, 80%, 90%, or
95% full-
length sequences). The cDNA library can be constructed from a variety of
tissues from a
monocot or dicot at a variety of developmental stages: Exemplary species
include maize,
wheat, rice, canola, soybean, cotton, sorghum, sunflower, alfalfa, oats, sugar
cane, millet,
barley, and rice. Methods of selectively hybridizing, under selective
hybridization
conditions, a polynucleotide from a full-length enriched library to a
polynucleotide of the
present invention are known to those of ordinary skill in the art. Any number
of stringency
conditions can be employed to allow for selective hybridization. In optional
embodiments,
the stringency allows for selective hybridization of sequences having at least
70%, 75%,
80%, 85%, 90%, 95%, or 98% sequence identity over the length of the hybridized
region.
Full-length enriched cDNA libraries can be normalized to increase the
representation of rare
sequences.
I. Polynucleotide Products Made by a cDNA Isolation Pt~ocess
As indicated in (I), above, the present invention provides an isolated
polynucleotide
made by the process of I) providing a full-length enriched nucleic acid
library, 2)
selectively hybridizing the polynucleotide to a polynucleotide of paragraphs
(A), (B), (C),
(D), (E), (F), (G, or (H) as discussed above, and thereby isolating the
polynucleotide from
the nucleic acid library. Full-length enriched nucleic acid libraries are
constructed as
discussed in paragraph (G) and below. Selective hybridization conditions are
as discussed
in paragraph (G). Nucleic acid purification procedures are well known in the
art.
Purification can be conveniently accomplished using solid-phase methods; such
methods
are well known to those of skill in the art and kits are available from
commercial suppliers
such as Advanced Biotechnologies (Surrey, UK). For example, a polynucleotide
of
paragraphs (A)-(H) can be immobilized to a solid support such as a membrane,
bead, or
particle. See, e.g., U.S. Patent No. 5,667,976. The polynucleotide product of
the present
26
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
process is selectively hybridized to an immobilized polynucleotide and the
solid support is
subsequently isolated from non-hybridized polynucleotides by methods
including, but not
limited to, centrifugation, magnetic separation, filtration, electrophoresis,
and the like.
Construction of Nucleic Acids
The isolated nucleic acids of the present invention can be made using (a)
standard
recombinant methods, (b) synthetic techW ques, or combinations thereof. In
some
embodiments, the polynucleotides of the present invention will be cloned,
amplified, or
otherwise constructed from a monocot such as maize, rice, or wheat, or a dicot
such as
soybean.
The nucleic acids may conveniently comprise sequences in addition to a
polynucleotide of the present invention. For example, a mufti-cloning site
comprising one
or more endonuclease restriction sites may be inserted into the nucleic acid
to aid in
isolation of the polynucleotide. Also, translatable sequences may be inserted
to aid in the
isolation of the translated polynucleotide of the present invention. For
example, a hexa-
histidine marker sequence provides a convenient means to purify the proteins
of the present
invention. A polynucleotide of the present invention can be attached to a
vector, adapter, or
linker for cloning and/or expression of a polynucleotide of the present
invention.
Additional sequences may be added to such cloning and/or expression sequences
to
optimize their function in cloning and/or expression, to aid in isolation of
the
polynucleotide, or to improve the introduction of the polynucleotide into a
cell. Typically,
the length of a nucleic acid of the present invention less the length of its
polynucleotide of
the present invention is less than 20 kilobase pairs, often less than 15 kb,
and frequently less
than 10 kb. Use of cloning vectors, expression vectors, adapters, and linkers
is well known
and extensively described in the art. For a description of various nucleic
acids see, for
example, Stratagene Cloning Systems, Catalogs 1999 (La Jolla, CA); and,
Amersham Life
Sciences, Inc, Catalog '99 (Arlington Heights, IL).
A. Recombinczrat Methods fot~ ConstYUCting Nucleic Acids
The isolated nucleic acid compositions of this invention, such as RNA, cDNA,
genomic DNA, or a hybrid thereof, can be obtained from plant biological
sources using any
number of cloning methodologies known to those of skill in the art. In some
embodiments,
oligonucleotide probes which selectively hybridize, under stringent
conditions, to the
27
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
polynucleotides of the present invention are used to identify the desired
sequence in a
cDNA or genomic DNA library. Isolation of RNA, and construction of cDNA and
genomic
libraries is well known to those of ordinary skill in the art. See, e.g.,
Plarzt Molecular
Biology: A Laboratory Manual, Clark, Ed., Springer-Verlag, Berlin (1997); and,
Currezzt
Protocols izz Molecular Biology, Ausubel, et al., Eds., Greene Publishing and
Wiley-
Interscience, New York (1995).
A1. Full-le>zgth Enriched cDNA Libraz°ies
A number of cDNA synthesis protocols have been described which provide
enriched
full-length cDNA libraries. Enriched full-length cDNA libraries are
constructed to
comprise at least 600%, and more preferably at least 70%, 80%, 90% or 95% full-
length
inserts amongst clones containing inserts. The length of insert in such
libraries can be at
least 2,3, 4, 5, 6, 7, 8, 9, 10 or more kilobase pairs. Vectors to accommodate
inserts of these
sizes are known in the art and available commercially. See, e.g., Stratagene's
lambda ZAP
Express (cDNA cloning vector with 0 to 12 kb cloning capacity). An exemplary
method of
constructing a greater than 95% pure full-length cDNA library is described by
Carninci et
al., Genoznics, 37:327-336 (1996). Other methods for producing full-length
libraries are
known in the art. See, e.g., Edery et al., Mol. Cell Biol.,l5(6):3363-3371
(1995); and, PCT
Application WO 96/34981.
A2 Normalized or Subtracted cDNA Libraries
A non-normalized cDNA library represents the mRNA population of the tissue it
was made from. Since unique clones are out-numbered by clones derived from
highly
expressed genes their isolation can be laborious. Normalization of a cDNA
library is the
process of creating a library in which each clone is more equally represented.
Construction
of normalized libraries is described in Ko, Nucl. Acids. Res., 18(19):5705-
5711 (1990);
Patanjali et al., Proc. Natl. Acad. U.S.A., 88:1943-1947 (1991); U.S. Patents
5,482,685,
5,482,845, and 5,637,685. In an exemplary method described by Soares et al.,
normalization resulted in reduction of the abundance of clones from a range of
four orders
of magnitude to a narrow range of only 1 order of magnitude. Proc. Natl. Acad.
Sci. USA,
91:9228-9232(1994).
Subtracted cDNA libraries are another means to increase the proportion of less
abundant cDNA species. In this procedure, cDNA prepared from one pool of mRNA
is
28
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
depleted of sequences present in a second pool of mRNA by hybridization. The
cDNA:mRNA hybrids are removed and the remaining un-hybridized cDNA pool is
enriched for sequences unique to that pool. See, Foote et al. in, Plant
Molecular Biology: A
Laboratofy Manual, Clark, Ed., Springer-Verlag, Berlin (1997); Kho and Zarbl,
Technique,
3(2):58-63 (1991); Sive and St. John, Nucl. Acids Res., 16(22):10937 (1988);
Current
Protocols in Molecular Biology, Ausubel, et al., Eds., Greene Publishing and
Wiley-
Interscience, New York (1995); and, Swaroop et al., Nucl. Acids Res.,
19)8):1954 (1991).
cDNA subtraction lcits are commercially available. See, e.g., PCR-Select
(Clontech, Palo
Alto, CA).
To construct genomic libraries, large segments of genomic DNA are generated by
fragmentation, e.g. using restriction endonucleases, and are ligated with
vector DNA to
form concatemers that can be packaged into the appropriate vector.
Methodologies to
accomplish.these ends, and sequencing methods to verify the sequence of
nucleic acids are
well known in the art. Examples of appropriate molecular biological techniques
and
instructions sufficient to direct persons of skill through many construction,
cloning, and
screening methodologies are found in Sambrook, et al., Molecular Cloning: A
Labos°atory
Manual, 2nd Ed., Cold Spring Harbor Laboratory Vols. 1-3 (1989), Methods in
Enzyrnology, Vol. 152: CYUide to Moleczclar Cloning Techniques, Bergen and
Kimmel, Eds.,
San Diego: Academic Press, Inc. (1987), Current Protocols in Molecular
Biology, Ausubel,
et al., Eds., Greene Publishing and Wiley-Interscience, New York (1995); Plant
Molecular
Biology: A Laboratory Manual, Clark, Ed., Springer-Verlag, Berlin (1997). Fits
for
construction of genomic libraries are also commercially available.
The cDNA or genomic library can be screened using a probe based upon the
sequence of a polynucleotide of the present invention such as those disclosed
herein.
Probes may be used to hybridize with genomic DNA or cDNA sequences to isolate
homologous genes in the same or different plant species. Those of skill in the
art will
appreciate that various degrees of stringency of hybridization can be employed
in the assay;
and either the hybridization or the wash medium can be stringent.
The nucleic acids of interest can also be amplified from nucleic acid samples
using
amplification techniques. For instance, polymerise chain reaction (PCR)
technology can be
used to amplify the sequences of polynucleotides of the present invention and
related genes
directly from genomic DNA or cDNA libraries. PCR and other in vitro
amplification
methods may also be useful, for example, to clone nucleic acid sequences that
code for
29
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
proteins to be expressed, to make nucleic acids to use as probes for detecting
the presence
of the desired mRNA in samples, for nucleic acid sequencing, or for other
puzposes. The T4
gene 32 protein (Boehringer Mannheim) can be used to improve yield of long PCR
products.
PCR-based'screening methods have been described. Wilfinger et al. describe a
PCR-based method in which the longest cDNA is identified in the first step so
that
incomplete clones can be eliminated from study. BioTechniques, 22(3}: 48I-486
(1997).
Such methods are particularly effective in combination with a full-length cDNA
construction methodology, above.
B. Synthetic Methods for CofZStructing Nucleic Acids
The isolated nucleic acids of the present invention can also be prepared by
direct
chemical synthesis by methods such as the phosphotriester method of Narang et
al., Meth.
Erazyrnol. 68: 90-99 (1979); the phosphodiester method of Brown et al., Metla.
Er~zymol. 68:
I S 109-I S I (I979); the diethylphosphoramidite method of Beaucage et al.,
Tetra. Lett. 22:
1859-1862 (1981); the solid phase phosphoramidite triester method described by
Beaucage
and Caruthers, Tetra. Letts. 22(20): 1859-1862 (1981), e.g., using an
automated synthesizer,
e.g., as described in Needham-VanDevanter et al., Nucleic Acids Res., 12: 6159-
6168
(1984); and, the solid support method of U.S. Patent No. 4,458,066. Chemical
synthesis
generally produces a single stranded oligonucleotide. This may be converted
into double
stranded DNA by hybridization with a complementary sequence, or by
polymerization with
a DNA polymerase using the single strand as a template. One of skill will
recognize that
while chemical synthesis of DNA is best employed for sequences of about 100
bases or less,
longer sequences may be obtained by the ligation of shorter sequences.
Recombinant Expression Cassettes
The.present invention further provides recombinant expression cassettes
comprising
a nucleic acid of the present invention. A nucleic acid sequence coding for
the desired
polypeptide of the present invention, for example a cDNA or a genomic sequence
encoding
a full length polypeptide of the present invention, can be used to construct a
recombinant
expression cassette which can be introduced into the desired host cell. A
recombinant
expression cassette will typically comprise a polynucleotide of the present
invention
operably linked to transcriptional initiation regulatory sequences which will
direct the
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
transcription of the polynucleotide in the intended host cell, such as tissues
of a transformed
plant.
For example, plant expression vectors may include (1) a cloned plant gene
under the transcriptional control of 5' and 3' regulatory sequences and (2) a
dominant selectable marker. Such plant expression vectors may also contain, if
desired, a promoter regulatory region (e.g., one conferring inducible or
constitutive, environmentally- or developmentally-regulated, or cell- or
tissue-specific/selective expression), a transcription initiation start site,
a ribosome
binding site, an RNA processing signal, a transcription termination site,
and/or
a polyadenylation signal.
A plant promoter fragment can be employed which will direct expression of a
polynucleotide of the present invention in all tissues of a regenerated plant.
Such promoters
are referred to herein as "constitutive" promoters and are active under most
environmental
conditions and states of development or cell differentiation. Examples of
constitutive
promoters include the cauliflower mosaic virus (CaMV) 35S transcription
initiation region,
the 1'- or 2'- promoter derived from T-DNA of Agv~obactef°ium
tusraefaciens, the ubiquitin 1
promoter, the Smas promoter, the cinnamyl alcohol dehydrogenase promoter (U.S.
Patent
No. 5,683,439), the Nos promoter, the pEmu promoter, the rubisco promoter, and
the
GRPl-8 promoter.
Alternatively, the plant promoter can direct expression of a polynucleotide of
the
present invention in a specific tissue or may be otherwise under more precise
environmental
or developmental control. °Such promoters are referred to here as
"inducible" promoters.
Environmental conditions that may effect transcription by inducible promoters
include
pathogen attack, anaerobic conditions, or the presence of light. Examples of
inducible
promoters are the Adhl promoter which is inducible by hypoxia or cold stress,
the Hsp70
promoter which is inducible by heat stress, and the PPDK promoter which is
inducible by
light.
Examples of promoters under developmental control include promoters that
initiate
transcription only, or preferentially, in certain tissues, such as leaves,
roots, fruit, seeds, or
flowers. Exemplary promoters include the anther-specific promoter 5126 (U.S.
Patent Nos.
5,689,049 and 5,689,051), glb-1 promoter, and gamma-zero promoter. Also see,
for
example, U.S. patent applications 60/155,859, and 60/163,114. The operation of
a promoter
may also vary depending on its location in the genome. Thus, an inducible
promoter may
31
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
become fully or partially constitutive in certain locations.
Both heterologous and non-heterologous (i.e., endogenous) promoters can be
employed to direct expression of the nucleic acids of the present invention.
These
promoters can also be used, for example, in recombinant expression cassettes
to drive
expression of antisense nucleic acids to reduce, increase, or alter
concentration and/or
composition of the proteins of the present invention in a desired tissue.
Thus, in some
embodiments, the nucleic acid construct will comprise a promoter, functional
in a plant cell,
operably linked to a polynucleotide of the present invention. Promoters useful
in these
embodiments include the endogenous promoters driving expression of a
polypeptide of the
present invention.
In some embodiments, isolated nucleic acids which serve as promoter or
enhancer
elements can be introduced in the appropriate position (generally upstream) of
a non-
heterologous form of a polynucleotide of the present invention so as to up or
down regulate
expression of a polynucleotide of the present invention. For example,
endogenous
promoters can be altered iya vivo by mutation, deletion, and/or substitution
(see, Kmiec, U.S.
Patent 5,565,350; Zarling et al., PCT/LTS93103868), or isolated promoters can
be introduced
into a plant cell in the proper orientation and distance from a cognate gene
of a
polynucleotide of the present invention so as to control the expression of the
gene. Gene
expression can be modulated under conditions suitable for plant growth so as
to alter the
total concentration and/or alter the composition of the polypeptides of the
present invention
in plant cell. Thus, the present invention provides compositions, and methods
for making,
heterologous promoters and/or enhancers operably linked to a native,
endogenous (i.e., non-
heterologous) form of a polynucleotide of the present invention.
If polypeptide expression is desired, it is generally desirable to include a
polyadenylation region at the 3'-end of a polynucleotide coding region. The
polyadenylation region can be derived from the natural gene, from a variety of
other plant
genes, or from T-DNA. The 3' end sequence to be added can be derived from, for
example,
the nopaline synthase or octopine synthase genes, or alternatively from
another plant gene,
or less preferably from any other eukaryotic gene.
An intron sequence can be added to the 5' untranslated region or the coding
sequence of the partial coding sequence to increase the amount of the mature
message that
accumulates in the cytosol. Inclusion of a spliceable intron in the
transcription unit in both
plant and animal expression constricts has been shown to increase gene
expression at both
32
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
the mRNA and protein levels up to 1000-fold. Buchman and Berg, Mol. Cell Biol.
8: 4395-
4405 (1988); Callis et al., Genes Dev. 1: 1183-1200 (1.987). Such intron
enhancement of
gene expression is typically greatest when placed near the 5' end of the
transcription unit.
Use of maize introns Adhl-S intron 1, 2, and 6, the Bronze-1 intron are known
in the art.
See generally, The Maize Handbook, Chapter 116, Freeling and Walbot, Eds.,
Springer,
New York (1994). The vector comprising the sequences from a polynucleotide of
the
present invention will typically comprise a marker gene which confers a
selectable
phenotype on plant cells. Typical vectors useful for expression of genes in
higher plants are
well known in the art and include vectors derived from the tumor-inducing (Ti)
plasmid of
Ag~obacterium tumefacieras described by Rogers et al., Meth. in Enzymol.,
153:253-277
(1987).
A polynucleotide of the present invention can be expressed in either sense or
anti-
sense orientation as desired. It will be appreciated that control of gene
expression in either
sense or anti-sense orientation can have a direct impact on the observable
plant
characteristics. Antisense technology can be conveniently used to inhibit gene
expression
in plants. To accomplish this, a nucleic acid segment from the desired gene is
cloned and
operably linked to a promoter such that the anti-sense strand of RNA will be
transcribed.
The construct is then transformed into plants and the antisense strand of RNA
is produced.
Tn plant cells, it has been shown that antisense RNA inhibits gene expression
by preventing
the accumulation of mRNA which encodes the enzyme of interest, see, e.g.,
Sheehy'et al.,
Proc. Nat'l. Acad. Sci. (USA) 85: 8805-8809 (1988); and Hiatt et al., U.S.
Patent No.
4,801,340.
Another method of suppression is sense suppression (i.e., co-supression).
Introduction of nucleic acid configured in the sense orientation has been
shown to be an
effective means by which to block the transcription of target genes. For an
example of the
use of this method to modulate expression of endogenous genes see, Napoli et
al., The Plant
Cell 2: 279-289 (1990) and U.S. Patent No. 5,034,323.
Catalytic RNA molecules or ribozymes can also be used to inhibit expression of
plant genes. Tt is possible to design ribozymes that specifically pair with
virtually any target
RNA and cleave the phosphodiester backbone at a specific location, thereby
functionally
inactivating the target RNA. In carrying out this cleavage, the ribozyme is
not itself altered,
and is thus capable of recycling and cleaving other molecules, making it a
true enzyme.
The inclusion of ribozyme sequences within antisense RNAs confers RNA-cleaving
activity
33
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
upon them, thereby increasing the activity of the constmcts. The design and
use of target
RNA-specific ribozymes is described in Haseloff et al., Nature 334: 585-591
(1988).
A variety of cross-linking agents, alkylating agents and radical generating
species as
pendant groups on polynucleotides of the present invention can be used to
bind, label,
detect, and/or cleave nucleic acids. For example, Vlassov, V. V., et al.,
Nucleic Acids Res
(1986) 14:4065-4076, describe covalent bonding of a single-stranded DNA
fragment with
alkylating derivatives of nucleotides complementary to target sequences. A
report of similar
work by the same group is that by Knorre, D. G., et al., Biochimie (1985)
67:785-789.
Iverson and Dervan also showed sequence-specific cleavage of single-stranded
DNA
mediated by incorporation of a modified nucleotide which was capable of
activating
cleavage (JAm Chena Soc (1987) 109:1241-1243). Meyer, R. B., et al., JAna Chem
Soc
(1989) 111:8517-8519, effect covalent crosslinking to a target nucleotide
using an
alkylating agent complementary to the single-stranded target nucleotide
sequence. A
photoactivated crosslinking to single-stranded oligonucleotides mediated by
psoralen was
disclosed by Lee, B. L., et al., Biochemistfy (1988) 27:3197-3203. Use of
crosslinking in
triple-helix forming probes was also disclosed by Home, et al., JAm Chem Soc
(1990)
112:2435-2437. Use of N4, N4-ethanocytosine as an alkylating agent to
crosslink to single-
stranded oligonucleotides has also been described by Webb and Matteucci, JAm
Chersa Soc
(1986) 108:2764-2765; Nucleic Acids Res (1986) 14:7661-7674; Feteritz et al.,
J. Am.
Cherya. Soc. 113:4000 (1991). Various compounds to bind, detect, label, andlor
cleave
nucleic acids are known in the art. See, for example, U.S. Patent Nos.
5,543,507; 5,672,593;
5,484,908; 5,256,648; and, 5,681941.
Proteins
The isolated proteins of the present invention comprise a polypeptide having
at least
10 amino acids from a polypeptide of the present invention (or conservative
variants
thereof) such as those encoded by any one of the polynucleotides of the
present invention as
discussed more fully above (e.g., Table 1). The proteins of the present
invention or variants
thereof can comprise any number of contiguous amino acid residues from a
polypeptide of
the present invention, wherein that number is selected from the group of
integers consisting
of from 10 to the number of residues in a full-length polypeptide of the
present invention.
Optionally, this subsequence of contiguous amino acids is at least 15, 20, 25,
30, 35, or 40
amino acids in length, often at least 50, 60, 70, 80, or 90 amino acids in
length. Further, the
34
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
number of such subsequences can be any integer selected from the group
consisting of from
1 to 20, such as 2, 3, 4, or 5.
The present invention further provides a protein comprising a polypeptide
having a
specified sequence identity/similarity with a polypeptide of the present
invention. The
percentage of sequence identity/similarity is an integer selected from the
group consisting of
from 50 to 99. Exemplary sequence identity/similarity values include 55%, 60%,
65%,
70%, 75%, 80%, 85%, 90%, and 95%. Sequence identity can be determined using,
for
example, the GAP, CLUSTALW, or BLAST algorithms.
As those of skill will appreciate, the present invention includes, but is not
limited to,
catalytically active polypeptides of the present invention (i.e., enzymes).
Catalytically
active polypeptides have a specific activity of at least 20%, 30%, or 40%, and
preferably at
least 50%, 60%, or 70%, and most preferably at least 80%, 90%, or 95% that of
the native
(non-synthetic), endogenous polypeptide. Further, the substrate specificity
(k~at/Km) is
optionally substantially similar to the native (non-synthetic), endogenous
polypeptide.
Typically, the Km will be at Ieast 30%, 40%, or 50%, that of the native (non-
synthetic),
endogenous polypeptide; and more preferably at least 60%, 70%, 80%, or 90%.
Methods of
assaying and quantifying measures of enzymatic activity and substrate
specificity (k°at~m)~
are well known to those of skill in the art.
Generally, the proteins of the present invention will, when presented as an
immunogen, elicit production of an antibody specifically reactive to a
polypeptide of the
present invention. Further, the proteins of the present invention will not
bind to antisera
raised against a polypeptide of the present invention which has been fully
immunosorbed
with the same polypeptide. Immunoassays for determining binding are well known
to those
of skill in the art. A preferred immunoassay is a competitive immunoassay.
Thus, the
proteins of the present invention can be employed as immunogens for
constructing
antibodies immunoreactive to a protein of the present invention for such
exemplary utilities
as immunoassays or protein purification techniques.
Expression of Proteins in Host Cells
Using the nucleic acids of the present invention, one may-express a protein of
the
present invention in a recombinantly engineered cell such as bacteria, yeast,
insect,
mammalian, or preferably plant cells. The cells produce the protein in a non-
natural
condition (e.g., in quantity, composition, location, and/or time), because
they have been
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
genetically altered through human intervention to do so.
It is expected that those of skill in the art are knowledgeable in the
numerous
expression systems available for expression of a nucleic acid encoding a
protein of the
present invention. No attempt to describe in detail the various methods known
for the
expression of proteins in prokaryotes or eulcaryotes will be made.
In brief summary, the expression of isolated nucleic acids encoding a protein
of the
present invention will typically be achieved by operably linking, for example,
the DNA or
cDNA to a promoter (which is either constitutive or regulatable), followed by
incorporation
into an expression vector. The vectors can be suitable for replication and
integration in
either prokaryotes or eukaryotes. Typical expression vectors contain
transcription and
translation terminators, initiation sequences, and promoters useful for
regulation of the
expression of the DNA encoding a protein of the present invention. To obtain
high level
expression of a cloned gene, it is desirable to construct expression vectors
which contain, at
the minimum, a strong promoter to direct transcription, a ribosome binding
site for
translational initiation, and a transcription/translation terminator. One of
skill would
recognize that modifications can be made to a protein of the present invention
without
diminishing its biological activity. Some modifications may be made to
facilitate the
cloning, expression, or incorporation of the targeting molecule into a fusion
protein. Such
modifications are well known to those of skill in the art and include, for
example, a
methionine added at the amino terminus to provide an initiation site, or
additional amino
acids (e.g., poly His) placed on either terminus to create conveniently
located purification
sequences. Restriction sites or termination codons can also be introduced.
Synthesis of Proteins
The proteins of the present invention can be constructed using non-cellular
synthetic
methods. Solid phase synthesis of proteins of less than about 50 amino acids
in length may
be accomplished by attaching the C-terminal amino acid of the sequence to an
insoluble
support followed by sequential addition of the remaining amino acids in the
sequence.
Techniques for solid phase synthesis are described by Barany and Merrifield,
Solid-Phase
Peptide Synthesis, pp. 3-284 in The Peptides: Afialysis, Syfatlaesis, Biology.
Yol. 2: Special
Methods in Peptide Synthesis, Paf-t A.; Merrifield, et al., J. Am. Claem. Soc.
85: 2149-2156
(1963), and Stewa~t et al., Solid Plaase Peptide Syhtlaesis, Zhd ed., Pierce
Chem. Co.,
Rockford, Ill. (1984). Proteins of greater length may be synthesized by
condensation of the
36
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
amino and carboxy termini of shorter fragments. Methods of forming peptide
bonds by
activation of a carboxy terminal end (e.g., by the use of the coupling reagent
N,N'-
dicycylohexylcarbodiimide) are known to those of skill.
Purification of Proteins
The proteins of the present invention may be purified by standard techniques
well
known to those of skill in the art. Recombinantly produced proteins of the
present invention
can be directly expressed or expressed as a fusion protein. The recombinant
protein is
purified by a combination of cell Iysis (e.g., sonication, French press) and
affinity
chromatography. For fusion products, subsequent digestion of the fusion
protein with an
appropriate proteolytic enzyme releases the desired recombinant protein.
The proteins of this invention, recombinant or synthetic, may be purified to
substantial purity by standard techniques well known in the art, including
detergent
solubilization, selective precipitation with such substances as ammonium
sulfate, column
chromatography, immunopurification methods, and others. See, for instance, R.
Scopes,
Protein Purification: Principles czrzd Practice, Springer-Verlag: New York
(1982);
Deutscher, Guide to Proteizz Purification, Academic Press (1990). For example,
antibodies
may be raised to the proteins as described herein. Purification from E. coli
can be achieved
following procedures described in U.S. Patent No. 4,511,503. The protein may
then be
isolated from cells expressing the protein and further purified by standard
protein chemistry
techniques as described herein. Detection of the expressed protein is achieved
by methods
known in the art and include, for example, radioimmunoassays, Western blotting
techniques
or immunoprecipitation.
Introduction of Nucleic Acids Into Host Cells
The method of introducing a nucleic acid of the present invention into a host
cell is
not critical to the instant invention. Transformation or transfection methods
are
conveniently used. Accordingly, a wide variety of methods have been developed
to insert a
DNA sequence into the genome of a host cell to obtain the transcription and/or
translation
of the sequence to effect phenotypic changes in the organism. Thus, any method
which
provides for effective introduction of a nucleic acid may be employed.
37
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
A. Plant Transforrraation
A nucleic acid comprising a polynucleotide of the present invention is
optionally
introduced into a plant. Generally, the polynucleotide will first be
incorporated into a
recombinant expression cassette or vector. Isolated nucleic acid acids of the
present
invention can be introduced into plants according to techniques known in the
art.
Techniques for transforming a wide variety of higher plant species are well
known and
described in the technical, scientific, and patent literature. See, for
example, Weising et al.,
Aran. Rev. Genet. 22: 421-477 (1988). For example, the DNA construct may be
introduced
directly into the genomic DNA of the plant cell using techniques such as
electroporation,
polyethylene glycol (PEG) poration, particle bombardment, silicon fiber
delivery, or
microinjection of plant cell protoplasts or embryogenic callus. See, e.g.,
Tomes, et al.,
Direct DNA Transfer into Intact Plant Cells Via Microprojectile Bombardment.
pp.197-213
in Plant Cell, Tissue and Organ Culture, Fundamental Methods. eds. O. L.
Gamborg and
G.C. Phillips. Springer-Verlag Berlin Heidelberg New York, 1995; see, U.S.
Patent No.
1 S 5,990,387. The introduction of DNA constructs using PEG precipitation is
described in
Paszkowski et al., Embo J. 3: 2717-2722 (1984). Electroporation techniques are
described
in Fromm et al., Proc. Natl. Aead. Sci. (LISA) 82: 5824 (1985). Ballistic
transformation
techniques are described in Klein et al., Nature 327: 70-73 (1987).
Agrobacteriurn turnefaciens-mediated transformation techuques are well
described
in the scientific literature. See, for example Horsch et al., Scierrce 233:
496-498 (1984);
Fraley et al., Proc. Natl. Acad. Sci. (USA) 80: 4803 (1983); and, Plant
Molecular Biology:
A Laboratory Manual, Chapter 8, Clark, Ed., Springer-Verlag, Berlin (1997).
The DNA
constructs may be combined with suitable T-DNA flanking regions and introduced
into a
conventional Agrobacterium tumefaciens host vector. The virulence functions of
the
Agrobacterium tumefaciens host will direct the insertion of the construct and
adjacent
maxker into the plant cell DNA when the cell is infected by the bacteria. See,
U.S. Patent
No. 5,591,616. Although Agrobacteriurra is useful primarily in dicots, certain
monocots can
be transformed by Agrobacterium. For instance, Agr°obacteriurra
transformation of maize is
described in U.S. Patent No. 5,550,318.
Other methods of transfection or transformation include (1) Agrobacteriurn
rhizogenes-mediated transformation (see, e.g., Lichtenstein and Fuller In:
Genetic
Engineering, vol. 6, PWJ Rigby, Ed., London, Academic Press, 1987; and
Lichtenstein, C.
P., and Draper, J,. In: DNA Cloning, Vol. II, D. M. Glover, Ed., Oxford, IRI
Press, 1985),
38
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
Application PCT/LTS87/02512 (WO 88/02405 published Apr. 7, 1988) describes the
use of
A. rlzizogenes strain A4 and its Ri plasmid along with A. tumefaciens vectors
pARC8 or
pARCl6 (2) liposome-mediated DNA uptake (see, e.g., Freeman et al., Plant Cell
Playsiol.
25: 1353 (1984)), (3) the vortexing method (see, e.g., Kindle, Pr oc. Natl.
Acad. Sci., (USA)
S 87: 1228 (1990).
DNA can also be introduced into plants by direct DNA transfer into pollen as
described by Zhou et al., Methods in Enzymology, 101:433 (1983); D. Hess,
Intern Rev.
Cytol., 107:367 (1987); Luo et al., Plant Mol. Biol. Reporter, 6:165
(1988). Expression of polypeptide coding genes can be obtained by injection of
the DNA
into reproductive organs of a plant as described by Pena et al., Nature,
325.:274 (1987).
DNA can also be injected directly into the cells of immature embryos and the
rehydration of
desiccated embryos as described by Neuhaus et al., Theor. Appl. Geraet., 75:30
(1987); and
Benbrook et al., in Proceedings Bio Expo 1986, Butterworth, Stoneham, Mass.,
pp. 27-54
(1986). A variety of plant viruses that can be employed as vectors are known
in the art and
include cauliflower mosaic virus (CaMV), geminivirus, brome mosaic virus, and
tobacco
mosaic virus.
B. Trayasfection of Prokafyotes, Lower Eukaryotes, and Arairnal Cells
Animal and lower eukaryotic (e.g., yeast) host cells are competent or rendered
competent for transfection by various means. There are several well-known
methods of
introducing DNA into animal cells. These include: calcium phosphate
precipitation, fusion
of the recipient cells with bacterial protoplasts containing the DNA,
treatment of the
recipient cells with liposomes containing the DNA, DEAE dextran,
electroporation,
biolistics, and micro-injection of the DNA directly into the cells. The
transfected cells are
cultured by means well known in the art. Kuchler, R.J., Biochemical Methods in
Cell
Culture and Virology, Dowden, Hutchinson and Ross, Inc. (1977).
Trans~enic Plant Regeneration
Plant cells which directly result or are derived from the nucleic acid
introduction
techniques can be cultured to regenerate a whole plant which possesses the
introduced
genotype. Such regeneration techniques often rely on manipulation of certain
phytohormones in a tissue culture growth medium. Plants cells can be
regenerated, e.g.,
from single cells, callus tissue or leaf discs according to standard plant
tissue culture
39
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
techniques. It is well known in the art that various cells, tissues, and
organs from almost
any plant can be successfully cultured to regenerate an entire plant. Plant
regeneration from
cultured protoplasts is described in Evans et al., Protoplasts Isolation and
Cadtzcre,
Handbook of Plant Cell Culture, Macmillan Publishing Company, New York, pp.
124-176
(1983); and Binding, Regeneration ofPlants, Plant Pf°otoplasts, CRC
Press, Boca Raton,
pp. 21-73 (1985).
The regeneration of plants from either single plant protoplasts or various
explants is
well known in the art. See, for example, Methods for Plant Molecular Biology,
A.
Weissbach and H. Weissbach, eds., Academic Press, Inc., San Diego, Calif.
(1988). This
regeneration and growth process includes the steps of selection of
transformant cells and
shoots, rooting the transformant shoots and growth of the plantlets in soil.
For maize cell
culture and regeneration see generally, The Maize Handbook, Freeling and
Walbot, Eds.,
Springer, New York (1994); Corn and Corn Improvement, 3rd edition, Sprague and
Dudley
Eds., American Society of Agronomy, Madison, Wisconsin (1988). For
transformation and
regeneration of maize see, Cordon-Kamm et al., The Plant Cell, 2:603-618
(1990).
The regeneration of plants containing the polynucleotide of the present
invention
and introduced by Agrobacteriuna from leaf explants can be achieved as
described by
Horsch et al., Science, 227:1229-1231 (1985). In this procedure, transformants
are grown
in the presence of a selection agent and in a medium that induces the
regeneration of shoots
in the plant species being transformed as described by Fraley et al., Proc.
Natl. Acad. Sci.
(U.S.A.), 80:4803 (1983). This procedure typically produces shoots within two
to four
weeks and these transfonnant shoots are then transferred to an appropriate
root-inducing
medium containing the selective agent and an antibiotic to prevent bacterial
growth.
Transgenic plants of the present invention may be fertile or sterile.
One of skill will recognize that after the recombinant expression cassette is
stably
incorporated in transgenic plants and confirmed to be operable, it can be
introduced into
other plants by sexual crossing. Any of a number of standard breeding
techniques can be
used, depending upon the species to be crossed. In vegetatively propagated
crops, mature
transgenic plants can be propagated by the taking of cuttings or by tissue
culture techniques
to produce multiple identical plants. Selection of desirable transgenics is
made and new
varieties are obtained and propagated vegetatively for commercial use. In seed
propagated
crops, mature transgenic plants can be self crossed to produce a homozygous
inbred plant.
The inbred plant produces seed containing the newly introduced heterologous
nucleic acid.
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
These seeds can be grown to produce plants that would produce the selected
phenotype.
Parts obtained from the regenerated plant, such as flowers, seeds, leaves,
branches, fruit,
and the like are included in the invention, provided that these parts comprise
cells
comprising the isolated nucleic acid of the present invention. Progeny and
variants, and
S mutants of the regenerated plants are also included within the scope of the
invention,
provided that these parts comprise the introduced nucleic acid sequences.
Transgenic
plants expressing a polynucleotide of the present invention can be screened
for transmission
of the nucleic acid of the present invention by, for example, standard
immunoblot and DNA
detection techniques. Expression at the RNA level can be determined initially
to identify
and quantitate expression-positive plants. Standard techniques for RNA
analysis can be
employed and include PCR amplification assays using oligonucleotide primers
designed to
amplify only the heterologous RNA templates and solution hybridization assays
using
heterologous nucleic acid-specific probes. The RNA-positive plants can then
analyzed for
protein expression by Western immunoblot analysis using the specifically
reactive
1 S antibodies of the present invention. In addition, ira situ hybridization
and
immunocytochemistry according to standard protocols can be done using
heterologous
nucleic acid specific polynucleotide probes and antibodies, respectively, to
localize sites of
expression within transgenic tissue. Generally, a number of transgenic lines
are usually
screened for the incorporated nucleic acid to identify and select plants with
the most
appropriate expression profiles.
A preferred embodiment is a transgenic plant that is homozygous for the added
heterologous nucleic acid; i.e., a transgenic plant that contains two added
nucleic acid
sequences, one gene at the same locus on each chromosome of a chromosome pair.
A
homozygous transgenic plant can be obtained by sexually mating (selfing) a
heterozygous
2S transgenic plant that contains a single added heterologous nucleic acid,
germinating some of
the seed produced and analyzing the resulting plants produced for altered
expression of a
polynucleotide of the present invention relative to a control plant (i.e.,
native, non-
transgenic). Back-crossing to a parental plant and out-crossing with a non-
transgenic plant
are also contemplated.
Modulating Polypeptide Levels and/or Composition
The present invention further provides a method for modulating (i.e.,
increasing or
decreasing) the concentration or ratio of the polypeptides of the present
invention in a plant
41
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
or part thereof. Modulation can be effected by increasing or decreasing the
concentration
and/or the ratio of the polypeptides of the present invention in a plant. The
method
comprises introducing into a plant cell a recombinant expression cassette
comprising a
polynucleotide of the present invention as described above to obtain a
transgenic plant cell,
culturing the transgenic plant cell under transgenic plant cell growing
conditions, and
inducing or repressing expression of a polynucleotide of the present invention
in the
transgenic plant for a time sufficient to modulate concentration and/or the
ratios of the
polypeptides in the transgenic plant or plant part.
In some embodiments, the concentration and/or ratios of polypeptides of the
present
invention in a plant may be modulated by altering, i~ vivo or ih vitro, the
promoter of a gene
to up- or down-regulate gene expression. In some embodiments, the coding
regions of
native genes of the present invention can be altered via substitution,
addition, insertion, or
deletion to decrease activity of the encoded enzyme. See, e.g., I~miec, U.S.
Patent
5,565,350; Zarling et al., PCT/LTS93/03868. And in some embodiments, an
isolated nucleic
acid (e.g., a vector) comprising a promoter sequence is transfected into a
plant cell.
Subsequently, a plant cell comprising the promoter operably linked to a
polynucleotide of
the present invention is selected for by means known to those of skill in the
art such as, but
not limited to, Southern blot, DNA sequencing, or PCR analysis using primers
specific to
the promoter and to the gene and detecting amplicons produced therefrom. A
plant or plant
part altered or modified by the foregoing embodiments is grown under plant
forming
conditions for a time sufficient to modulate the concentration and/or ratios
of polypeptides
of the present invention in the plant. Plant forming conditions are well known
in the art and
discussed briefly, supra.
In general, concentration or the ratios of the polypeptides is increased or
decreased
by at least 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% relative to a
native
control plant, plant part, or cell lacking the aforementioned recombinant
expression cassette.
Modulation in the present invention may occur during and/or subsequent to
growth of the
plant to the desired stage of development. Modulating nucleic acid expression
temporally
and/or in particular tissues can be controlled by employing the appropriate
promoter
operably linked to a polynucleotide of the present invention in, for example,
sense or
antisense orientation as discussed in greater detail, supra. Induction of
expression of a
polynucleotide of the present invention can also be controlled by exogenous
administration
of an effective amount of inducing compound. Inducible promoters and inducing
42
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
compounds which activate expression from these promoters are well known in the
art. In
preferred embodiments, the polypeptides of the present invention are modulated
in
monocots, particularly maize.
UTRs and Codon Preference
In general, translational efficiency has been found to be regulated by
specific
sequence elements in the 5' non-coding or untranslated region (5' UTR) of the
RNA.
Positive sequence motifs include translational initiation consensus sequences
(Kozak,
Nucleic Acids Res.15:8125 (1987)) and the 7-methylguanosine cap structure
(Drummond et
al.; Nucleic Acids Res. 13:7375 (1985)). Negative elements include stable
intramolecular 5'
UTR stem-loop structures (Muesing et al., Cell 48:691 (1987)) and AUG
sequences or short
open reading frames preceded by an appropriate AUG in the 5' UTR (Kozak,
supra, Rao
et al., Mol. ayad Cell. Biol. 8:284 (1988)). Accordingly, the present
invention provides 5'
and/or 3' untranslated regions for modulation of translation of heterologous
coding
sequences.
Further, the polypeptide-encoding segments of the polynucleotides of the
present
invention can be modified to alter codon usage. Altered codon usage can be
employed to
alter translational efficiency and/or to optimize the coding sequence for
expression in a
desired host such as to optimize the codon usage in a heterologous sequence
for expression
in maize. Codon usage in the coding regions of the polynucleotides of the
present invention
can be analyzed statistically using commercially available software packages
such as
"Codon Preference" available from the University of Wisconsin Genetics
Computer Group
(see Devereaux et al., Nucleic Acids Res. 12: 387-395 (1984)) or MacVector 4.1
(Eastman
Kodak Co., New Haven, Conn.). Thus, the present invention provides a codon
usage
frequency characteristic of the coding region of at least one of the
polynucleotides of the
present invention. The number of polynucleotides that can be used to determine
a codon
usage frequency can be any integer from 1 to the number of polynucleotides of
the present
invention as provided herein. Optionally, the polynucleotides will be full-
length sequences.
An exemplary number of sequences for statistical analysis can be at least l,
5, 10, 20, 50, or
100.
Seguence Shuffling
The present invention provides methods for sequence shuffling using
43
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
polynucleotides of the present invention, and compositions resulting
therefrom. Sequence
shuffling is described in PCT publication No. WO 97/20078. See also, Zhang, J.-
H., et al.
Proc. Natl. Acad. Sci. USA 94:4504-4509 (1997). Generally, sequence shuffling
provides a
means for generating libraries of polynucleotides having a desired
characteristic which can
be selected or screened for. Libraries of recombinant polynucleotides are
generated from a
population of related sequence polynucleotides which comprise sequence regions
which
have substantial sequence identity and can be homologously recombined ih vitro
or iya vivo.
The population of sequence-recombined polynucleotides comprises a
subpopulation of
polynucleotides which possess desired or advantageous characteristics and
which can be
selected by a suitable selection or screening method. The characteristics can
be any
property or attribute capable of being selected for or detected in a screening
system, and
may include properties of: an encoded protein, a transcriptional element, a
sequence
controlling transcription, RNA processing, RNA stability, chromatin
conformation,
translation, or other expression property of a gene or transgene, a
replicative element, a
protein-binding element, or the like, such as any feature which confers a
selectable or
detectable property. In some embodiments, the selected characteristic will be
a decreased
Km and/or increased I~at over the wild-type protein as provided herein. In
other
embodiments, a protein or polynucleotide generated from sequence shuffling
will have a
Iigand binding affinity greater than the non-shuffled wild-type
polynucleotide. The increase
in such properties can be at least 110%, 120%, 130%, 140% or at least 150% of
the wild-
type value.
Generic and Consensus Seguences
Polynucleotides and polypeptides of the present invention further include
those
having: (a) a generic sequence of at least two homologous polynucleotides or
polypeptides,
respectively, of the present invention; and, (b) a consensus sequence of at
least three
homologous polynucleotides or polypeptides, respectively, of the present
invention. The
generic sequence of the present invention comprises each species of
polypeptide or
polynucleotide embraced by the generic polypeptide or polynucleotide sequence,
respectively. The individual species encompassed by a polynucleotide having an
amino
acid or nucleic acid consensus sequence can be used to generate antibodies or
produce
nucleic acid probes or primers to screen for homologs in other species,
genera, families,
orders, classes, phyla, or kingdoms. For example, a polynucleotide having a
consensus
44
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
sequence from a gene family of Zea f~aays can be used to generate antibody or
nucleic acid
probes or primers to other GYar~zifaeae species such as wheat, rice, or
sorghum.
Alternatively, a polynucleotide having a consensus sequence generated from
orthologous
genes can be used to identify or isolate orthologs of other taxa. Typically, a
polynucleotide
having a consensus sequence will be at least 9, 10, 15, 20, 25, 30, or 40
amino acids in
length, or 20, 30, 40, 50, 100, or 150 nucleotides in length. As those of
skill in the art are
aware, a conservative amino acid substitution can be used for amino acids
which differ
amongst aligned sequence but are from the same conservative substitution group
as
discussed above. Optionally, no more than 1 or 2 conservative amino acids are
substituted
for each 10 amino acid length of consensus sequence.
Similar sequences used for generation of a consensus or generic sequence
include
any number and combination of allelic variants of the same gene, orthologous,
or
paralogous sequences as provided herein. Optionally, similar sequences used in
generating
a consensus or generic sequence are identified using the BLAST algorithm's
smallest surn
probability (P(N)). Various suppliers of sequence-analysis software are listed
in chapter 7
of CuYYent PYOtocols in Molecular Biology, F.M. Ausubel et al., Eds., Current
Protocols, a
joint venture between Greene Publishing Associates, Tnc. and John Wiley &
Sons, Inc.
(Supplement 30). A polynucleotide sequence is considered similar to a
reference sequence
if the smallest sum probability in a comparison of the test nucleic acid to
the reference
nucleic acid is less than about 0.1, more preferably less than about 0.01, or
0.001, and most
preferably less than about 0.0001, or 0.00001. Similar polynucleotides can be
aligned and a
consensus or generic sequence generated using multiple sequence alignment
software
available from a number of commercial suppliers such as the Genetics Computer
Group's
(Madison, WI) PILEUP software, Vector NTI's (North Bethesda, MD) ALIGNX, or
Genecode's (Ann Arbor, MI) SEQUENCHER. Conveniently, default parameters of
such
software can be used to generate consensus or generic sequences.
Machine Applications
The present invention provides machines, data structures, and processes for
modeling or analyzing the polynucleotides and polypeptides of the present
invention.
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
A. Machines: Data, Data Structures, PYOCesses, ahcl Fufactions
The present invention provides a machine having a memory comprising: 1) data
representing a sequence of a polynucleotide or polypeptide of the present
invention, 2) a
data structure which reflects the underlying organization and structure of the
data and
facilitates program access to data elements corresponding to logical sub-
components of the
sequence, 3) processes for effecting the use, analysis, or modeling of the
sequence, and 4)
optionally, a function or utility for the polynucleotide or polypeptide. Thus,
the present
invention provides a memory for storing data that can be accessed by a
computer
programmed to implement a process for effecting the use, analyses, or modeling
of a
sequence of a polynucleotide, with the memory comprising data representing the
sequence
of a polynucleotide of the present invention.
The machine of the present invention is typically a digital computer. The term
"computer" includes one or several desktop or portable computers, computer
workstations,
servers (including intranet or Internet servers), mainframes, and any
integrated system
I S comprising any of the above irrespective of whether the processing,
memory, input, or
output of the computer is remote or local, as well as any networking
interconnecting the
modules of the computer. The term "computer" is exclusive of computers of the
United
States Patent and Trademark Office or the European Patent Office when data
representing
the sequence of polypeptides or polynucleotides of the present invention is
used for
patentability searches.
The present invention contemplates providing as data a sequence of a
polynucleotide
of the present invention embodied in a computer readable medium. As those of
skill in the
art will be aware, the form of memory of a machine of the present invention,
or the
particular embodiment of the computer readable medium, are not critical
elements of the
invention and can take a variety of forms. The memory of such a machine
includes, but is
not limited to, ROM, or RAM, or computer readable media such as, but not
limited to,
magnetic media such as computer disks or hard drives, or media such as CD-
ROMs, DVDs,
and the Like.
The present invention further contemplates providing a data structure that is
also
contained in memory. The data structure may be defined by the computer
programs that
define the processes (see below) or it may be defined by the programming of
separate data
storage and retrieval programs subroutines, or systems. Thus, the present
invention
provides a memory for storing a data structure that can be accessed by a
computer
46
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
programmed to implement a process for effecting the use, analysis, or modeling
of a
sequence of a polynucleotide. The memory comprises data representing a
polynucleotide
having the sequence of a polynucleotide of the present invention. The data is
stored within
memory. Further, a data structure, stored within memory, is associated with
the data
reflecting the underlying organization and structure of the data to facilitate
program access
to data elements corresponding to logical sub-components of the sequence. The
data
structure enables the polynucleotide to be identified and manipulated by such
programs.
In a further embodiment, the present invention provides a data structure that
contains data representing a sequence of a polynucleotide of the present
invention stored
within a computer readable medium. The data structure is organized to reflect
the logical
structuring of the sequence, so that the sequence is easily analyzed by
software programs
capable of accessing the data structure. W particular, the data structures of
the present
invention organize the reference sequences of the present invention in a
manner which
allows software tools to perform a wide variety of analyses using logical
elements and sub-
elements of each sequence.
An example of such a data structure resembles a layered .hash table, where in
one
dimension the base content of the sequence is represented by a string of
elements A, T, C, G
and N. The direction from the 5' end to the 3' end is reflected by the order
from the
position 0 to the position of the length of the string minus one. Such a
string, corresponding
to a nucleotide sequence of interest, has a certain number of substrings, each
of which is
delimited by the string position of its 5' end and the string position of its
3' end within the
parent string. In a second dimension, each substring is associated with or
pointed to one or
multiple attribute fields. Such attribute fields contain annotations to the
region on the
nucleotide sequence represented by the substring.
For example, a sequence under investigation is 520 bases long and represented
by a
string named SeqTarget. There is a minor groove in the 5' upstream non-coding
region
from position 12 to 38, which is identified as a binding site for an enhancer
protein HM-A,
which in turn will increase the transcription of the gene represented by
SeqTarget. Here,
the substring is represented as (12, 38) and has the following attributes:
[upstream
uncoded], [minor groove], [HM-A binding] and [increase transcription upon
bindilig by
HM-A]. Similarly, other types of information can be stored and structured in
this manner,
such as information related to the whole sequence, e.g., whether the sequence
is a full
length viral gene, a mammalian house keeping gene or an EST from clone X,
information
47
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
related to the 3' down stream non-coding region, e.g., hair pin stnicture, and
information
related to various domains of the coding region, e.g., Zinc finger.
This data structure is an open structure and is robust enough to accommodate
newly
generated data and acquired knowledge. Such a structure is also a flexible
structure. It can
be trimmed down to a 1-D string to facilitate data mining and analysis steps,
such as
clustering, repeat-masking, and HMM analysis. Meanwhile, such a data structure
also can
extend the associated attributes into multiple dimensions. Pointers can be
established
among the dimensioned attributes when needed to facilitate data management and
processing in a comprehensive genomics knowledgebase. Furthermore, such a data
structure is object-oriented. Polymorphism can be represented by a family or
class of
sequence objects, each of which has an internal structure as discussed above.
The common
traits are abstracted and assigned to the parent object, whereas each child
object represents a
specific variant of the family or class. Such a data structure allows data to
be efficiently
retrieved, updated and integrated by the software applications associated with
the sequence
database and/or knowledgebase.
The present invention contemplates providing processes for effecting analysis
and
modeling, which are described in the following section.
Optionally, the present invention further contemplates that the machine of the
present invention will embody in some manner a utility or function for the
polynucleotide or
polypeptide of the present invention. The function or utility of the
polynucleotide or
polypeptide can be a function or utility for the sequence data, per- se, or of
the tangible
material. Exemplary function or utilities include the name (per International
Union of
Biochemistry and Molecular Biology rules of nomenclature) or function of the
enzyme or
protein represented by the polynucleotide or polypeptide of the present
invention; the
metabolic pathway of the protein represented by the polynucleotide or
polypeptide of the
present invention; the substrate or product or structural role of the protein
represented by the
polynucleotide or polypeptide of the present.invention; or, the phenotype
(e.g., an
agronomic or pharmacological trait) affected by modulating expression or
activity of the
protein represented by the polynucleotide or polypeptide of the present
invention.
B. Computef-Anal~sis and Modeling
The present invention provides a process of modeling and analyzing data
representative of a polynucleotide or polypeptide sequence of the present
invention. The
48
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
process comprises entering sequence data of a polynucleotide or polypeptide of
the present
invention into a machine having a hardware or software sequence modeling and
analysis
system, developing data structures to facilitate access to the sequence data,
manipulating the
data to model or analyze the stricture or activity of the polynucleotide or
polypeptide, and
S displaying the results of the modeling or analysis. Thus, the present
invention provides a
process for effecting the use, analysis, or modeling of a polynucleotide
sequence or its
derived peptide sequence through use of a computer having a memory. The
process
comprises 1) placing into the memory data representing a polynucleotide having
the
sequence of a polynucleotide of the present invention,
developing within the memory a data structure associated with the data and
reflecting the
underlying organization and structure of the data to facilitate program access
to data ~~
elements corresponding to logical sub-components of the sequence, 2)
programming the
computer with a program containing instructions sufficient to implement the
process for
effecting the use, analysis, or modeling of the polynucleotide sequence or the
peptide
sequence, and, 3) executing the program on the computer while granting the
program access
to~the data and to the data stricture within the memory.
A variety of modeling and analytic tools are well known in the art and
available
commercially. Included amongst the modeling/analysis tools are methods to: 1)
recognize
overlapping sequences (e.g., from a sequencing project) with a polynucleotide
of the present
invention and create an alignment called a "contig"; 2) identify restriction
enzyme sites of a
polynucleotide of the present invention; 3) identify the products of a Tl
ribonuclease
digestion of a polynucleotide of the present invention; 4) identify PCR
primers with
minimal self complementarity; 5) compute pairwise distances between sequences
in an
alignment, reconstruct phylogentic trees using distance methods, and calculate
the degree of
divergence of two protein coding regions; 6) identify patterns such as coding
regions;
terminators, repeats, and other consensus patterns in polynucleotides of the
present
invention; 7) identify RNA secondary structure; 8) identify sequence motifs,
isoelectric
point, secondary structure, hydrophobicity, and antigenicity in polypeptides
of the present
invention; 9) translate polynucleotides of the present invention and
backtranslate
polypeptides of the present invention; and 10) compare two protein or nucleic
acid
sequences and identifying points of similarity or dissimilarity between them.
The processes for effecting analysis and modeling can be produced
independently or
obtained from commercial suppliers. Exemplary analysis and modeling tools are
provided
49
CA 02406381 2003-08-26
'rc~rrvse~~i~~s~
in products such as InforMax's (Bethesda, MD) Vector NTI Suite (Version S.5),
Intelligenetics' (Mountain View, CA) PC/Gene program, and Genetics Computer
Group's
(Madison, WI) Wisconsin Package (Version 10.0),
Thus, in a further embodiment, the present invention provides a machine-
readable
media containing a computer program and data, comprising a program stored on
the media
containing instructions sufficient'to implement a process for effecting the
use, analysis, or
modeling of a representation of a polynucleotide or peptide sequence. The data
stored on
the media represents a sequence of a polynucleotide having the sequence of a
polynucleotide of the present invention. The media also includes a data
structure reflecting
the underlying organization and structure of the data to facilitate program
access to data
elements corresponding to logical sub-components of the sequence, the data
structure being
1 S inherent in the program and in the way in which the program organizes and
accesses the
data.
C. Homology Searches
As an example of such a comparative analysis, the present invention provides a
process of identifying a candidate homologue (i.e., an ortholog or paralog) of
a
polynucleotide or polypeptide of the present invention. The process comprises
entering
sequence data of a polynucleotide or polypeptide of the present invention into
a machine
having a hardware or software sequence analysis system, developing data
structures to
facilitate access to the sequence data, manipulating the data to analyze the
structure the
polynucleotide or polypeptide, and displaying the results of the analysis. A
candidate
homologue has statistically significant probability of having the same
biological function
(e.g., catalyzes the same reaction, binds to homologous proteins/nucleic
acids, has a similar
structural role) as the reference sequence to which it is compared.
Accordingly, the
polynucleotides and polypeptides of the present invention have utility in
identifying
homologs in animals or other plant species, particularly those in the family
Gramineae such
as, but not limited to, sorghum, wheat, or rice.
The process of the present invention comprises obtaining data representing a
polynucleotide or polypeptide test sequence. Test sequences can be obtained
from a nucleic
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
acid of an animal or plant. Test sequences can be obtained directly or
indirectly from
sequence databases including, but not limited to, those such as: GenBank,
EMBL, GenSeq,
SWISS-PROT, or those available on-line via the UK Human Genome Mapping Project
(HGMP) GenomeWeb. In some embodiments the test sequence is obtained from a
plant
species other than maize whose function is uncertain but will be compared to
the test
sequence to determine sequence similarity or sequence identity. The test
sequence data is
entered into a machine, such as a computer, containing: i) data representing a
reference
sequence and, ii) a hardware or software sequence comparison system to compare
the
reference and test sequence for sequence similarity or identity.
Exemplary sequence comparison systems are provided for in sequence analysis
software such as those provided by the Genetics Computer Group (Madison, WI)
or
InforMax (Bethesda, MD), or Intelligenetics (Mountain View, CA). Optionally,
sequence
comparison is established using the BLAST or GAP suite of programs. Generally,
a
smallest sum probability value (P(N)) of less than 0.1, or alternatively, less
than 0.01, 0.001,
0.0001, or 0.00001 using the BLAST 2.0 suite of algorithms under default
parameters
identifies the test sequence as a candidate homologue (i.e., an allele,
ortholog, or paralog) of
the reference sequence. Those of skill in the art will recognize that a
candidate homologue
has an increased statistical probability of leaving the same or similar
function as the
gene/protein represented by the test sequence.
The reference sequence can be the sequence of a polypeptide or a
polynucleotide of
the present invention. The reference or test sequence is each optionally at
least 25 amino
acids or at least 100 nucleotides in length. The length of the reference or
test sequences can
be the length of the polynucleotide or polypeptide described, respectively,
above in the
sections entitled "Nucleic Acids" (particularly section (g)), and "Proteins".
As those of skill
in the art are aware, the greater the sequence identity/similarity between a
reference
sequence of known function and a test sequence, the greater the probability
that the test
sequence will have the same or similar function as the reference sequence. The
results of
the comparison between the test and reference sequences are outputted (e.g.,
displayed,
printed, recorded) via any one of a number of output devices and/or media
(e.g., computer
monitor, hard copy, or computer readable medium).
Detection of Nucleic Acids
The present invention further provides methods for detecting a polynucleotide
of the
51
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
present invention in a nucleic acid sample suspected of containing a
polynucleotide of the
present invention, such as a plant cell lysate, particularly a lysate of
maize. In some
embodiments, a cognate gene of a polynucleotide of the present invention or
portion thereof
can be amplified prior to the step of contacting the nucleic acid sample with
a
polynucleotide of the present invention. The nucleic acid sample is contacted
with the
polynucleotide to form a hybridization complex. The polynucleotide hybridizes
under
stringent conditions to a gene encoding a polypeptide of the present
invention. Formation of
the hybridization complex is used to detect a gene encoding a polypeptide of
the present
invention in the nucleic acid sample. Those of skill will appreciate that an
isolated nucleic
acid comprising a polynucleotide of the present invention should lack cross-
hybridizing
sequences in common with non-target genes that would yield a false positive
result.
Detection of the hybridization complex can be achieved using any number of
well known
methods. For example, the nucleic acid sample, or a portion thereof, may be
assayed by
hybridization formats including but not limited to, solution phase, solid
phase, mixed phase,
or isZ situ hybridization assays.
Detectable labels suitable for use in the present invention include any
composition
detectable by spectroscopic, radioisotopic, photochemical, biochemical,
immunochemical,
electrical, optical or chemical means. Useful labels in the present invention
include biotin
for staining with labeled streptavidin conjugate, magnetic beads, fluorescent
dyes,
radiolabels, enzymes, and colorimetric labels. Other labels include ligands
which bind to
antibodies labeled with fluorophores, chemiluminescent agents, and enzymes.
Labeling the
nucleic acids of the present invention is readily achieved such as by the use
of labeled PCR
primers.
Although the present invention has been described in some detail by way of
illustration and example for purposes of clarity of understanding, it will be
obvious that
certain changes and modifications may be practiced within the scope of the
appended
claims.
Exam~~le 1
This example describes the constriction of a cDNA library.
Total RNA can be isolated from maize tissues with TRIzoI Reagent (Life
Technology Inc. Gaithersburg, MD) using a modification of the guanidine
isothiocyanatelacid-phenol procedure described by Chomczynslci and Sacchi
52
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
(Chomczynski, P., and Sacchi, N. Arcal. Biochem. 162, 156 (1987)). In brief,
plant tissue
samples is pulverized in liquid nitrogen before the addition of the TRIzoI
Reagent, and then
further homogenized with a mortar and pestle. Addition of chloroform followed
by
centrifugation is conducted for separation of an aqueous phase and an organic
phase. The
S total RNA is recovered by precipitation with isopropyl alcohol from the
aqueous phase.
The selection of poly(A)+ RNA from total RNA can be performed using PolyATact
system (Promega Corporation. Madison, WI). Biotinylated oligo(dT) primers axe
used to
hybridize to the 3' poly(A) tails on mRNA. The hybrids are captured using
streptavidin
coupled to paramagnetic particles and a magnetic separation stand. The mRNA is
then
washed at high stringency conditions and eluted by RNase-free deionized water.
cDNA synthesis and construction of unidirectional cDNA libraries can be
accomplished using the Superscript Plasmid System (Life Technology Inc.
Gaithersburg,
MD). The first strand of cDNA is synthesized by priming an oligo(dT) primer
containing a
Not I site. The reaction is catalyzed by Superscript Reverse Transcriptase II
at 4S°C. The
1 S second strand of cDNA is labeled with alpha-32P-dCTP and a portion of the
reaction
analyzed by agarose gel electrophoresis to determine cDNA sizes. cDNA
molecules
smaller than S00 base pairs and unligated adapters are removed by Sephacryl-
5400
chromatography. The selected cDNA molecules are ligated into pSPORTl vector in
between of Not I and Sal I sites.
Alternatively, cDNA libraries can be prepared by any one. of many methods
available. For example, the cDNAs may be introduced into plasmid vectors by
first
preparing the cDNA libraries in Uni-ZAPTM XR vectors according to the
manufacturer's
protocol (Stratagene Cloning Systems, La Jolla, CA). The Uni-ZAPTM XR
libraries are
converted into plasmid libraries according to the protocol provided by
Stratagene. Upon
2S conversion, cDNA inserts will be contained in the plasmid vector
pBluescript. In addition,
the cDNAs may be introduced directly into precut Bluescript II SK(+) vectors
(Stratagene)
using T4 DNA ligase (New England Biolabs), followed by transfection into DHlOB
cells
according to the manufacturer's protocol (GlBCO BRL Products). Once the cDNA
inserts
are in plasmid vectors, plasmid DNAs are prepared from randomly picked
bacterial
colonies containing recombinant pBluescript plasmids, or the insert cDNA
sequences are
amplified via polymerase chain reaction using primers specific for vector
sequences
flanking the inserted cDNA sequences. Amplified insert DNAs or plasmid DNAs
are
sequenced in dye-primer sequencing reactions to generate partial cDNA
sequences
S3
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
(expressed sequence tags or "ESTs"; see Adams et al., (1991) Science 252:1651-
1656).
The resulting ESTs are analyzed using a Perkin Elmer Model 377 fluorescent
sequencer.
Example 2
This method describes construction of a full-length enriched cDNA library.
An enriched full-length cDNA library can be constructed using one of two
variations of the method of Carninci et al. Geho~nics 37: 327-336, 1996. These
variations
are based on chemical introduction of a biotin group into the diol residue of
the 5' cap
structure of eukaryotic mRNA to select full-length first strand cDNA. The
selection occurs
by trapping the biotin residue at the cap sites using streptavidin-coated
magnetic beads
followed by RNase I treatment to eliminate incompletely synthesized cDNAs.
Second
strand cDNA is synthesized using established procedures such as those provided
in Life
Technologies' (Rockville, MD) "Superscript Plasmid System for cDNA Synthesis
and
Plasmid Cloning" kit. Libraries made by this method have been shown to contain
50% to
70% full-length cDNAs.
The first strand synthesis methods are detailed below. An asterisk denotes
that the
reagent was obtained from Life Technologies, Inc.
A. Fif st sty°arad cDNA sysztlaesis naetlaod 1 (with trehalose)
mRNA (l0ug ) 25,1
*Not I primer (Sug) 10.1
* Sx 1 St strand buffer 43 ~1
*0.1m DTT 20,1
* dNTP mix 1 Omm 10 ~,l
BSA l0ug/~.l 1 q,1
Trehalose (saturated) 59.2~.~1
RNase inhibitor (Promega) 1.8E~1
*Superscript II RT 200u/~,l 20q,1
100 % glycerol 181
Water 7p.1
The mRNA and Not I primer axe mixed and denatured at 65°C for 10 min.
They are
54
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
then chilled
on ice and
other components
added to
the tube.
Incubation
is at 45C
for 2 min.
Twenty microliters
of RT (reverse
transcriptase)
is added
to the reaction
and start
program
on the thermocycler
(MJ Research,
Waltham,
MA):
Step 1 45C lOmin
Step 2 45C -0.3C/cycle , 2 seconds/cycle
Step 3 go to 2 for 33 cycles
Step 4 35C Smin
Step 5 45C Smin
Step 6 45C 0.2C/cycle, 1 sec/cycle
Step 7 go to 7 for 49 cycles
Step 8 55C 0.1C/cycle, 12 sec/cycle
Step 9 go to 8 for 49 cycles
Step 10 55C 2min
Stepll 60C 2min
Step 12 go to 11 for 9 times
Step 13 4C forever
Stepl4 end
B. Fi~~st stYarZd cDNA sytatlZesis fnethod 2
mRNA (10~,g) 25.1
water 30,1
*Not I adapter primer (S~,g) 10.1
65°C for l Omin, chill on ice, then add following reagents,
*Sx first buffer 201
*O.1M DTT 10,1
* 1 OmM dNTP mix 5 p1
Incubate at 45°C for 2min, then add 10.1 of *Superscript II RT
(200u/~l), start the
following program:
Step 1 45°C for 6 sec, -0.1°C/cycle
Step 2 go to 1 for 99 additional cycles
Step 3 35°C for Smin
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
Step 4 45°C for 60 min
Step 5 50°C for 10 min
Step 6 4°C forever
Step 7 end
After the 1St strand cDNA synthesis, the DNA is extracted by phenol according
to
standard procedures, and then precipitated in NaOAc and ethanol, and stored in
-20°C.
C. Oxidizatio~z of the diol group of mRNA for biotifZ labeling
First strand cDNA is spun down and washed once with 70% EtOH. The pellet
resuspended in 23.2 p,1 of DEPC treated water and put on ice. Prepare 100 mM
of NaIO4
freshly, and then add the following reagents:
mRNA:lst cDNA (start with 20p,g mRNA ) 46.4p,1
100mM NaI04 (freshly made) 2.5,1
NaOAc 3M pH4.5 _ 1.1 ~,1
To make 100 mM NaI04, use 21.39p.g of NaI04 for 1 ~,1 of water.
Wrap the tube in a foil and incubate on ice for 45min.
After the incubation, the reaction is then precipitated in:
SM NaCI l Op.l
20%SDS O.Sp.I
isopropanol 61 ~,1
Incubate on ice for at least 30 min, then spin it down at max speed at
4°C for 30 min and
wash once with 70% ethanol and then 80% EtOH.
D. Biotifzylation of the mRNA diol group
Resuspend the DNA in 110,1 DEPC treated water, then add the following
reagents:
20% SDS 5 E.~,1
2 M NaOAc pH 6.1 5 ~,l
l Omm biotin hydrazide (freshly made) 300 ~,l
Wrap in a foil and incubate at room temperature overnight.
56
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
E. RNase I treatf~aent
Precipitate DNA in:
SM NaGl 10.1
2M NaOAc pH 6.1 75p.1
biotinylated mRNA:cDNA420,1
100% EtOH (2.SVo1) 1262.5,1
(Perform this precipitation in two tubes and split the 420 ~l of DNA into 210
~,l each, add
Sp.l of SM NaCl, 37.5y1 of 2M NaOAc pH 6.1, and 631.25 p,1 of 100% EtOH).
Store at -20°C for at least 30 min. Spin the DNA down at 4°C at
maximal speed for 30 min.
and wash with 80% EtOH twice, then dissolve DNA in 70p,1 RNase free water.
Pool two
tubes and end up with 140 ~.1.
Add the following reagents:
RNase One lOU/p.l 40,1
1St cDNA:RNA 140,1
l OX buffer 20,1
Incubate at 37°C for l5min.
Add Sp,l of 40~,g/p,l yeast tRNA to each sample for capturing.
F. Full length 1 S' cDNA captuYing
Blocking the beads with yeast tRNA:
Beads lml
Yeast tRNA 40p,g/yl 5~,1
Incubate on ice for 30min with mixing, wash 3 times with lml of 2M NaCI ,
SOmmEDTA, pH 8Ø
Resuspend the beads in 800p,1 of 2M NaCl , SOmm EDTA, pH 8.0, add RNase I
treated sample 200.1, and incubate the reaction for 30min at room temperature.
Capture the beads using the magnetic stand, save the supernatant, and start
following
washes:
2 washes with 2M NaCI , SOmm EDTA, pH 8.0, 1 ml each time,
1 wash with 0.4% SDS, SOp,g/ml tRNA,
1 wash with 1 Omm Tris-Cl pH 7.5, 0.2mm EDTA, lOmm NaCI, 20% glycerol,
57
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
1 wash with SO~.glml tRNA,
1 wash with 1st cDNA buffer
G. Secofad sttand cDNA synthesis
Resuspend the beads in:
USX first buffer 8p1
*O.lmM DTT 4~1
*l0mm dNTP mix 8~,1
*SX 2nd buffer 60.1
*E.coli Ligase l0U/yl 2p,1
*E.coli DNA polymerase l0U/~,l8~,1
*E. coli RNaseH 2U/~.l 2~1
P32 dCTP 10~,ci/~.1 2~,1
Or water up to 3001 208,1
Incubate at 16C for 2hr
with mixing the reaction
in every 30 min.
Add 4~,1 of T4 DNA polymerase
and incubate for additional
5 min at 16C.
Elute 2"a cDNA from the beads.
Use a magnetic stand to separate the 2"a cDNA from the beads, then resuspend
the beads in
200,1 of water, and then separate again, pool the samples (about 500,1),
Add 200 ~,l of water to the beads, then 200,1 of phenol:chloroform, vortex,
and spin to
separate the sample with phenol.
Pool the DNA together (about 700.1) and use phenol to clean the DNA again, DNA
is then
precipitated in 2p,g of glycogen and 0.5 voI of 7.5M NH40Ac and 2 vol of 100%
EtOH.
Precipitate overnight. Spin down the pellet and wash with 70% EtOH, air-dry
the pellet.
DNA 250,1 DNA 200,1
7.5M NH40Ac 125,1 ~ 7.5M NH40Ac 100,1
100% EtOH 750.1 100% EtOH 6001
glycogen lp.g/~l ' 2p.1 glycogen l~g/~.l 2~,1
58
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
H. Sal I aclapte~ ligatioh
Resuspend the pellet in 26 ~,1 of water and use 1 ~1 for TAE gel.
Set up reaction as following:
2°a strand cDNA 25 ~,l
~'5X T4 DNA ligase buffer 10.1
*Sal I adapters 101
*T4 DNA ligase 5 ~,1
Mix gently, incubate the reaction at 16°C overnight.
Add 2~,1 of ligase second day and incubate at room temperature for 2 hrs
(optional).
Add 50,1 water to the reaction and use 100y1 of phenol to clean the DNA, 90,1
of the upper
phase is transferred into a new tube and precipitate in:
Glycogen 1 ~,g/~.l 2~,1
Upper phase DNA 90,1
7.5M NH40Ac 50,1
100% EtOH 300,1
precipitate at -20°C overnight
Spin down the pellet at 4°C and wash in 70% EtOH, dry the pellet.
I. Not I digestion
2"d cDNA 41 ~,l
Reaction 3 buffer 5 ~,1
*Not I 15u/~,l 4~,1
Mix gently and incubate the reaction at 37°C for 2hr.
Add 50 ~,l of water and 100,1 of phenol, vortex , and take 90,1 of the upper
phase to a new
tube, then add 50,1 of NH40Ac and 300 ~,1 of EtOH. Precipitate overnight at
-20°C.
Cloning, ligation, and transformation are performed per the Superscript cDNA
synthesis kit.
59
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
Example 3
This example describes cDNA sequencing and library subtraction.
Individual colonies can be picked and DNA prepared either by PCR with M13
forward primers and M13 reverse primers, or by plasmid isolation. cDNA clones
can be
sequenced using M13 reverse primers.
cDNA libraries are plated out on 22 x 22 cmz agar plate at density of about
3,000
colonies per plate. The plates are incubated in a 37°C incubator for 12-
24 hours. Colonies
are picked into 384-well plates by a robot colony picker, Q-bot (GENETIX
Limited). These
plates are incubated overnight at 37°C. Once sufficient colonies are
picked, they are pinned
onto 22 x 22 cmz nylon membranes using Q-bot. Each membrane holds 9,216 or
36,864
colonies. These membranes are placed onto an agar plate with an appropriate
antibiotic.
The plates are incubated at 37°C overnight.
After colonies are recovered on the second day, these filters are placed on
filter
paper prewetted with denaturing solution for four minutes, then incubated on
top of a
boiling water bath for an additional four minutes. The filters are then placed
on filter paper
prewetted with neutralizing solution for four minutes. After excess solution
is removed by
placing the filters on dry filter papers for one minute, the colony side of
the filters is placed
into Proteinase K solution, incubated at 37°C for 40-50 minutes. The
filters are placed on
dry filter papers to dry overnight. DNA is then cross-linked to nylon membrane
by UV
light treatment.
Colony hybridization is conducted as described by Sambrook,J., Fritsch, E.F.
and
Maniatis, T., (in Molecular Cloning: A laboratory Manual, 2°d Edition).
The following
probes can be used in colony hybridization:
1. First strand cDNA from the same tissue as the library was made from to
remove
the most redundant clones.
2. 48-192 most redundant cDNA clones from the same library based on previous
sequencing data.
3. 192 most redundant cDNA clones in the entire maize sequence database.
4. A Sal-A20 oligo nucleotide: TCG ACC CAC GCG TCC GAA AAA AAA AAA
AAA AAA AAA, removes clones containing a poly A tail but no cDNA.
5. cDNA clones derived from rRNA.
The image of the autoradiography is scanned into computer and the signal
intensity
and cold colony addresses of each colony is analyzed. Re-arraying of cold-
colonies from
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
384 well plates to 96 well plates is conducted using Q-bot.
Example 4
This example describes identification of the gene from a computer homology
search.
Gene identities can be determined by conducting BLAST (Basic Local Aligmnent
Search Tool; Altschul, S. F., et al., (1993) J. Mol. Biol. 215:403-410; see
also
www.ncbi.nlm.nih.gov/BLASTn searches under default parameters for similarity
to
sequences contained in the BLAST "nr" database (comprising all non-redundant
GenBank
CDS translations, sequences derived from the 3-dimensional structure
Brookhaven Protein
Data Bank, the last major release of the SWISS-PROT protein sequence database,
EMBL,
and DDBJ databases). The cDNA sequences are analyzed for similarity to all
publicly
available DNA sequences contained in the "nr" database using the BLASTN
algorithm.
The DNA sequences are translated in all reading frames and compared for
similarity to all
publicly available protein sequences contained in the "nr" database using the
BLASTX
algoritlun (Gish, W. and States, D. J. Nature Genetics 3:266-272 (1993))
provided by the
NCBI. In some cases, the sequencing data from two or more clones containing
overlapping
segments of DNA are used to construct contiguous DNA sequences.
Sequence alignments and percent identity calculations can be performed using
the
Megalign program of the LASERGENE bioinformatics computing suite (DNASTAR
Inc.,
Madison, Wn. Multiple alignment of the sequences can be performed using the
Clustal
method of alignment (Higgins and Sharp (1989) CABIOS. 5:151-153) with the
default
parameters (GAP PENALTY=10, GAP LENGTH PENALTY=10). Default parameters for
pairwise alignments using the Clustal method are I~TUPLE 1, GAP PENALTY=3,
WINDOW=5 .and DIAGONALS SAVED=5.
Example 5
This example describes expression of transgenes in monocot cells.
A transgene comprising a cDNA encoding the instant polypeptides in sense
orientation with respect to the maize 27 kD zero promoter that is located 5'
to the cDNA
fragment, and the 10 kD zero 3' end that is located 3' to the cDNA fragment,
can be
constructed. The cDNA fragment of this gene may be generated by polymerase
chain
reaction (PCR) of the cDNA clone using appropriate oligonucleotide primers.
Cloning sites
(NcoI or SmaI) can be incorporated into the oligonucleotides to provide proper
orientation
61
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
of the DNA fragment when inserted into the digested vector pML103 as described
below.
Amplification is then performed in a standard PCR. The amplified DNA is then
digested
with restriction enzymes NcoI and SmaI and fractionated on an agarose gel. The
appropriate band can be isolated from the gel and combined with a 4.9 kb NcoI-
SmaI
fragment of the plasmid pML103. Plasmid pML103 has been deposited under the
terms of
the Budapest Treaty at ATCC (American Type Culture Collection, 10801
University Blvd.,
Manassas, VA 20110-2209), and bears accession number ATCC 97366. The DNA
segment from pML103 contains a 1.05 kb SalI-NcoI promoter fragment of the
maize 27 kD
zero gene and a 0.96 kb SmaI-SaII fragment from the 3' end of the maize 10 kD
zero gene in
the vector pGem9Zf(+) (Promega). Vector and insert DNA can be ligated at
15°C
overnight, essentially as described (Maniatis). The ligated DNA may then be
used to
transform E. coli XLl-Blue (Epicurian Coli XL-1 Blue; Stratagene). Bacterial
transformants can be screened by restriction enzyme digestion of plasmid DNA
and limited
nucleotide sequence analysis using the dideoxy chain termination method
(Sequenase DNA
Sequencing Kit; U. S. Biochemical). The resulting plasmid construct would
comprise a
transgene encoding, in the 5' to 3' direction, the maize 27 kD zero promoter,
a cDNA
fragment encoding the instant polypeptides, and the 10 kD zero 3' region.
The transgene described above can then be introduced into maize cells by the
following procedure. Immature maize embryos can be dissected from developing
caryopses derived from crosses of the inbred maize lines H99 and LH132. The
embryos are
isolated 10 to 11 days after pollination when they are M .0 to 1.5 mm long.
The embryos are
then placed with the axis-side facing down and in contact with agarose-
solidified N6
medium (Chu et al. (1975) Sci. Sin. Peking 18:659-668). The embryos are kept
in the dark
at 27°C. Friable embryogenic callus consisting of undifferentiated
masses of cells with
somatic proembryoids and embryoids borne on suspensor structures proliferates
from the
scutellum of these immature embryos. The embryogenic callus isolated from the
primary
explant can be cultured on N6 medium and sub-cultured on this medium every 2
to
3 weeks.
The plasmid, p35S/Ac (Hoechst Ag, Frankfurt, Germany) or equivalent may be
used
in transformation experiments in order to provide for a selectable marker.
This plasmid
contains the Pcct gene (see European Patent Publication 0 242 236) which
encodes
phosphinothricin acetyl transferase (PAT). The enzyme PAT confers resistance
to
herbicidal glutamine synthetase inhibitors such as phosphinothricin. The pat
gene in
62
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
p35S/Ac is under the control of the 35S promoter from Cauliflower Mosaic Virus
(Odell et
al. (1985) Nature 313:810-812) and the 3' region of the nopaline synthase gene
from the
T-DNA of the Ti plasmid of Agrobacterauyn turnefacaens. °
The particle bombardment method (Klein et al. (1987) Nature 327:70-73) may be
used to transfer genes to the callus culture cells. According to this method,
gold particles
(1 ~,m in diameter) are coated with DNA using the following technique. Ten p.g
of plasmid
DNAs are added to 50 ~L of a suspension of gold particles (60 mg per mL).
Calcium
chloride (50 ~,L of a 2.5 M solution) and spermidine free base (20 p,L of a
1.0 M solution)
are added to the particles. The suspension is vortexed during the addition of
these
solutions. After 10 minutes, the tubes are briefly centrifuged (5 sec at
15,000 rpm) and the
supernatant removed. The particles are resuspended in 200 ~,L of absolute
ethanol,
centrifuged again and the supernatant removed. The ethanol rinse is performed
again and
the particles resuspended in a final volume of 30 ~.L of ethanol. ' An aliquot
(5 ~.L) of the
DNA-coated gold particles can be placed in the center of a Kapton flying disc
(Bio-Rad
Labs). The particles are then accelerated into the maize tissue with a
Biolistic
PDS-1000lHe (Bio-Rad Instruments, Hercules CA), using a helium pressure of
1000 psi, a
gap distance of 0.5 cm and a flying distance of 1.0 cm.
For bombardment, the embryogenic tissue is placed on filter paper over agarose-
solidified N6 medium. The tissue is arranged as a thin lawn and covers a
circular area of
about 5 cm in diameter. The petri dish containing the tissue can be placed in
the chamber
of the PDS-10001He approximately 8 cm from the stopping screen. The air in the
charizber
is then evacuated to a vacuum of 28 inches of Hg. The macrocarrier is
accelerated with a
helium shock wave using a rupture membrane that bursts when the He pressure in
the shock
tube reaches 1000 psi.
Seven days after bombardment the tissue can be transferred to N6 medium that
contains gluphosinate (2 mg per liter) and lacks casein or proline. The tissue
continues to
grow slowly on this medium. After an additional 2 weeks the tissue can be
transferred to
fresh N6 medium containing gluphosinate. After 6 weeks, areas of about 1 cm in
diameter
of actively growing callus can be identified on some of the plates containing
the
glufosinate-supplemented medium. These calli may continue to grow when sub-
cultured on
the selective medium.
Plants can be regenerated from the transgenic callus by first transfernng
clusters of
tissue to N6 medium supplemented with 0.2 mg per liter of 2,4-D. After two
weelcs the
63
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
tissue can be transferred to regeneration medium (Fromm et al. (1990)
BiolTechyaologv
8:833-839).
Example 6
This example describes expression of transgenes in dicot cells.
A seed-specific expression cassette composed of the promoter and transcription
terminator from the gene encoding the (i subunit of the seed storage protein
phaseolin from
the bean Plaaseolus vulgaris (Doyle et al. (1986) J. Biol. Chem. 261:9228-
9238) can be used
for expression of the instant polypeptides in transformed soybean. The
phaseolin cassette
includes about 500 nucleotides upstream (5') from the translation initiation
codon and about
1650 nucleotides downstream (3') from the translation stop codon of phaseolin.
Between
the 5' and 3' regions are the unique restriction endonuclease sites Nco I
(which includes the
ATG translation initiation codon), SmaI, KpnI and XbaI. The entire cassette is
flanked by
Hind III sites.
The cDNA fragment of this gene may be generated by polymerase chain reaction
(PCR) of the cDNA clone using appropriate oligonucleotide primers. Cloning
sites can be
incorporated into the oligonucleotides to provide proper orientation of the
DNA fragment
when inserted into the expression vector. Amplification is then performed as
described
above, and the isolated fragment is inserted into a pUC 18 vector carrying the
seed
expression cassette.
Soybean embryos may then be transformed with the expression vector comprising
sequences encoding the instant polypeptides. To induce somatic embryos,
cotyledons,
3-5 mm in length dissected from surface sterilized, immature seeds of the
soybean cultivar
A2872, can be cultured in the light or dark at 26°C on an appropriate
agar medium for
6-10 weeks. Somatic embryos which produce secondary embryos are then excised
and
placed into a suitable Liquid medium. After repeated selection for clusters of
somatic
embryos which multiplied as early, globular staged embryos, the suspensions
are
maintained as described below.
Soybean embryogenic suspension cultures can maintained in 35 mL liquid media
on a
rotary shaker, 150 rpm, at 26°C with florescent lights on a 16:8 hour
day/night schedule.
Cultures are subcultured every two weeks by inoculating approximately 35 mg of
tissue into
mL of liquid medium.
64
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
Soybean embryogenic suspension cultures may then be transformed by the method
of
particle gun bombardment (Klein et al. (1987) Natuy-e (London) 327:70-73, U.S.
Patent No.
4,945,050). A Du Pont Biolistic PDS 1000/HE instrument (helium retrofit) can
be used for
these transformations.
S A selectable marker gene which can be used to facilitate soybean
transformation is a
transgene composed of the 35S promoter from Cauliflower Mosaic Virus (Odell et
a1..(1985) Natm°e 313:810-812), the hygromycin phosphotransferase gene
from plasmid
pJR225 (from E. coli; Gritz et a1.(1983) Gefae 25: I79-188) and the 3' region
of the nopaline
synthase gene from the T-DNA of the Ti plasmid of Agf~obacterium
tufrtefacierZS. The seed
expression cassette comprising the phaseolin 5' region, the fragment encoding
the instant
polypeptides and the phaseolin 3' region can be isolated as a restriction
fragment. This
fragment can then be inserted into a unique restriction site of the vector
carrying the marker
gene.
To 50 ~L of a 60 mg/mL 1 ~.mgold particle suspension is added (in order): 5
~,L
DNA (1 p.g/p,L), 20 p1 spermidine (0.1 M), and 50 p,L CaCl2 (2.5 M). The
particle '
preparation is then agitated for three minutes, spun in a microfuge for 10
seconds and the
supernatant removed. The DNA-coated particles are then washed once in 400 p,L
70%
ethanol and resuspended in 40 ~L of anhydrous ethanol. The DNA/particle
suspension can
be sonicated three times for one second each. Five microliters of the DNA-
coated gold
particles are then loaded on each macro carrier disk.
Approximately 300-400 mg of a two-week-old suspension culture is placed in an
empty 60x15 mm petri dish and the residual liquid removed from the tissue with
a pipette.
For each transformation experiment, approximately 5-10 plates of tissue are
normally
bombarded. Membrane rupture pressure is set at 1100 psi and the chamber is
evacuated to a
vacuum of 28 inches mercury. The tissue is placed approximately 3.5 inches
away from the
retaining screen and bombarded three times. Following bombardment, the tissue
can be
divided in half and placed back into liquid and cultured as described above.
Five to seven days post bombardment, the liquid media may be exchanged with
fresh
media, and eleven to twelve days post bombardment with fresh media containing
50 mg/mL
hygromycin. This selective media can be refreshed weekly. Seven to eight weeks
post
bombardment, green, transformed tissue may be observed growing from
Lultransformed,
necrotic embryogenic clusters. Isolated green tissue is removed and inoculated
into
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
individual flasks to generate new, clonally propagated, transformed
embryogenic
suspension cultures. Each new line may be treated as an independent
transformation event.
These suspensions can then be subcultured and maintained as clusters of
immature embryos
or regenerated into whole plants by maturation and germination of individual
somatic
embryos.
Example 7
This example describes expression of a transgene in microbial cells.
The cDNAs encoding the instant polypeptides can be inserted into the T7 E.
coli
expression vector pBT430. This vector is a derivative of pET-3a (Rosenberg et
al. (1987)
Gene 56:125-135) which employs the bacteriophage T7 RNA polymerase/T7 promoter
system. Plasmid pBT430 was constructed by first destroying the EcoR I and Hind
III sites
in pET-3a at their original positions. An oligonucleotide adaptor containing
EcoR I and
Hind III sites was inserted at the BamH I site of pET-3a. This created pET-3aM
with
additional unique cloning sites for insertion of genes into the expression
vector. Then, the
Nde I site at the position of translation initiation was converted to an Nco I
site using
oligonucleotide-directed mutagenesis. The DNA sequence of pET-3aM in this
region,
S'-CATATGG, was converted to 5'-CCCATGG in pBT430.
Plasmid DNA containing a cDNA may be appropriately digested to release a
nucleic
acid fragment encoding the protein. This fragment may then be purified on a 1
% NuSieve
GTG low melting agarose gel (FMC). Buffer and agarose contain 10 ~.glml
ethidium
bromide for visualization of the DNA fragment. The fragment can then be
purified from
the agarose gel by digestion with GELase (Epicentre Technologies) according to
the
manufacturer's instructions, ethanol precipitated, dried and resuspended in 20
~,L of water.
Appropriate oligonucleotide adapters may be ligated to the fragment using T4
DNA ligase
(New England Biolabs, Beverly, MA). The fragment containing the ligated
adapters can be
purified from the excess adapters using low melting agarose as described
above. The
vector pBT430 is digested, dephosphorylated with alkaline phosphatase (NEB)
and
deproteinized with phenol/chloroform as described above. The prepared vector
pBT430
and fragment can then be ligated at 16°C for 15 hours followed by
transformation into DHS
electrocompetent cells (GIBCO BRL). Transformants can be selected on agar
plates
containing LB media and 100 ~ g/mL ampicillin. Transformants containing the
gene
66
CA 02406381 2003-08-26
WO 01/79516 PCTNSO1/11951
encoding the instant polypeptides are then screened fpr the correct
orientation with respect
to the T7 promoter by restriction enzyme analysis.
For high level expression, a plasmid clone with the cDNA insert in the correct
orientation relative to the T7 promoter can be transformed into E. colt strain
BL21 (DE3)
(Studier et al. (1986) J. Mol. Biol. 189:113-130). Cultures are gown in LB
medium '
containing ampicillin (100 mg/L) at 25°C. At an optical density at 600
nm of
approximately 1, IPTG (isopropylthio-(3-galactoside, the inducer) can be added
to a final ,
concentration of 0.4 rrlM and incubation can be continued for 3 h at
25°. Cells are then
harvested by centrifugation and re-suspended in SO pL of 50 mM Tris-HCl at pH
8.0
containing 0.1 mM DTT and 0.2 mM phenyl methylsulfonyl fluoride. A small
amount of
1 mm glass beads can be added and the mixtwe sonicated 3 times for about 5
seconds~each
time with a microprobe sonicator. The mixtwe is centrifuged and the protein
concentration
of the supernatant determined. One microgam of protein from the soluble
fraction of the .
culture can be separated by SDS-polyacrylamide gel electrophoresis. Gels can
be observed
for protein bands migating at the expected molecular weight.
Example 8
This example describes a procedure to identify plants containing Mu inserted
into
genes of interest and a strategy to identify the function of those genes. This
example is
based on work with the CQRAD17 gene,
which is a member of the same gene family as SEQ
m Nos. 1 and 5 of the present application. One of skill in the art could
readily conceive of
use of this procedure with the sequences disclosed in the current application.
The Trait Utility stem for Com (TUSC) is a method that employs genetic and
molecular techniques to facilitate the study of gene function in maize.
Studying gene
function implies that the gene's sequence is already known, thus the method
works in
reverse: from sequence to phenotype. This kind of application is referred to
as "reverse
genetics", which contrasts with "forward" methods that are designed to
identify and isolate
the genes) responsible for a particular trait (phenotype).
Pioneer Hi-Bred International, Inc., has a proprietary collection of maize
genomic
DNA from approximately 42,000 individual F~ plants (Reverse genetics for
maize, Meeley,
R. and Briggs, 5.,1995, Maize Genet. Coop. Newslett. 69:67, 82). The genome of
each of
these individuals contains multiple copies of the transposable element family,
Mutator
(Mu). The Mu family is highly mutagenic; in the presence of the active element
Mu-DR,
67
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
these elements transpose throughout the genome, inserting into genic regions,
and often
disrupting gene function. By collecting genomic DNA from a large number
(42,000) of
individuals, Pioneer has assembled a library of the mutagenized maize genome.
Mu insertion events are predominantly heterozygous; given the recessive nature
of
most insertional mutations, the F1 plants appear wild-type. Each of the F1
plants is selfed to
produce FZ seed, which is collected. In generating the FZ progeny, insertional
mutations
segregate in a Mendelian fashion so are useful for investigating a mutant
allele's effect on
the phenotype. The TUSC system has been successfully used by a number of
laboratories
to identify the function of a variety of genes (Cloning and characterization
of the maize Anl
gene, Bensen, R.J., et al., 1995, Plant Cell 7:75-84; Diversification of C-
function activity in
maize flower development, Mena, M., et al., 1996, Science 274:1537-1540;
Analysis of a
chemical plant defense mechanism in grasses, Frey, M., et al., 1997, Science
277:696-
699;The control of maize spikelet meristem fate by the APETALA2-like gene
Indeterminate spikelet 1, Chuck, G., Meeley, R.B., and Hake, S., 1998, Genes &
Development 12:1145-1154; A Sect homologue is required for the elaboration of
the
chloroplast thylakoid membrane and for normal chloroplast gene expression,
Roy, L.M. and
Barkan, A., 1998, J. Cell Biol. 141:1-11).
PCR Screening forMu insertions in CQRAD17:
Two primers were designed from within the CQRAD 17 cDNA and designated as
gene-specific primers (GSPs):
Forward primer (GSP1): 5' - TGC TGA TAT CGA GAA GGC. CGG AAT CGT -3'
Reverse primer (GSP2): 5' - CTC CCC ACC AGA CCC TTG AGG - s'
Mu TIR primer: 5' - AGA GAA GCC AAC GCC AWC GCC TCY ATT TCG TC -3'
Pickoligo was used to select primers for PCR. This program chooses the Tm
according to the following equation:
Tm = [((GC*3 + AT*2)*37 - 562) / length] - 5
PCR reactions were run with an annealing temperature of 62 °C and a
thermocycling
profile as follows:
94 °C - 2' (initial denaturation)
/ 94 °C - 30" - 1'
cycles 62 °C - 30" - 2'
\ 72 °C - 1-3'
72 °C - 5' (final extension)
68
CA 02406381 2003-08-26
WO 01/79516 PCTlUS01/11951
Gel electrophoresis of the PCR products confirmed that there was no false
priming in single
primer reactions and that only one fragment was amplified in paired GSP
reactions.'
The genomic DNA from 42,000 plants, combined into pools of 48 plants each, was
subjected to PCR with either GSP1 or GSP2 and Mu TIR. The pools that were
confirmed to
be positive by dot-blot hybridization using CQRAD17 cDNA as a probe were
subjected to
gel-blot analysis in order.to determine the size of fragments amplified. The
pools in which
clean fragments were identified were subjected to further analysis to identify
the individual
plants within those pools that contained Mu insertion(s).
Seed from F~ plants identified in this manner was planted in the field. Leaf
discs
from twenty plants in each Fz row were collected and genomic DNA was isolated.
The
same twenty plants were selfed and the F3 seed saved. Pooled DNA (from 20
plants) from
each of twelve rows was subjected to PCR using GSP1 or GSP2 and Mu TiR primer
as
mentioned above. Three pools identified to contain Mu insertions were
subjected to
individual plant analysis and homozygotes identified. The Mu insertion sites
with the
surrounding signature sequences are identified below:
Allele 1: TCTTCACCA-Mu-GGTCCTTCG
Allele 2: GTCGAAATT-Mu-TTCTTCAGC
Allele 3: GCTCACGGG-Mu-GAAGTTTAT
All three insertions are within 200 nucleotides of each other in the open
reading frame,
suggesting that this region in the gene might represent a hot spot for Mu
insertion. One of
the insertions, allele 3, is in the region predicted to code for a
transmembrane domain. Each
of these insertions is expected to inactivate the gene.
The above examples are provided to illustrate the invention but not to limit
its scope.
Other variants of the invention will be readily apparent to one of ordinary
skill in the art and
are encompassed by the appended claims.
69
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
SEQUENCE LISTING
<110> Pioneer Hi-Bred International, Inc.
<120> Maize Cellulose Synthases and Uses
Thereof
<130> 08648-PCT
<150> US 60/096,822
<151> 1998-08-17
<150> US 09/371,383
<151> 1999-08-06
<150> US 09/550,483
<151> 2000-04-14
<160> 12
<170> FastSEQ Windows Version 3.0
for
<210> 1
<2l1> 2830
<2l2> DNA
<213> Zeamays
<220>
<221> CDS
<222> (3)...
(2468)
<221> misc
feature
<222> (1)_
. .(2830)
<223> n A,T,C G
= or
<400> 1
to cgc ata gag ctt 47
cct ata tct ctt
cta gtt cca aac
agt ccg aac
Pro Ile
Leu Ser
Ser Pro
Arg Asn
Ile Glu
Val Leu
Pro Asn
Leu
1 5 10 15
tatcgg gtgatt ctccgg cttatc atcctatgt ttcttcttt 95
atc gtt
TyrArg ValIle LeuArg LeuIle IleLeuCys PhePhePhe
Ile Val
20 25 30
caatat ataact ccagtg gaagat gettatggg ttgtggctt 143
cgt cat
GlnTyr IleThr ProVal GluAsp AlaTyrGly LeuTrpLeu
Arg His
35 40 45
gtatct atttgt gtttgg tttgcc ttgtcttgg cttctagat 191
gtt gaa
ValSer IleCys ValTrp PheAla LeuSerTrp LeuLeuAsp
Val Glu
50 55 60
cagttc aagtgg cctatc aaccgt gaaacttac ctcgataga 239
cca tat
GlnPhe LysTrp ProIle AsnArg GluThrTyr LeuAspArg
Pro Tyr
65 70 75
cttgca agatat agggag ggtgag ccatcccag ttggetcca 287
ttg gat
LeuA1a ArgTyr ArgGlu GlyGlu ProSerG1n LeuAlaPro
Leu Asp
80 85 90 95
1
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
atc gat gtc ttt gtt agt aca gtg gat cca ctt aag gaa cct cct cta 335
Ile Asp Val Phe Val Ser Thr Val Asp Pro Leu Lys Glu Pro Pro Leu
100 105 110
att act ggc aac act gtc ctg tcc att ctt get gtg gat tac cct gtt 383
Ile Thr Gly Asn Thr Val Leu Ser Ile Leu Ala Val Asp Tyr Pro Va1
115 120 125
gac aaa gta tca tgt tat gtt tct gat gac ggt tca get atg ttg act 431
Asp Lys Val Ser Cys Tyr Val Ser Asp Asp Gly Ser Ala Met Leu Thr
130 135 140
ttt gaa gcg cta tct gaa acc gca gag ttt gca agg aaa tgg gtt ccc 479
Phe G1u Ala Leu Ser Glu Thr Ala Glu Phe Ala Arg Lys Trp Val Pro
145 150 155
ttt tgc aag aaa cac aat att gaa cct agg get cca gag ttt tac ttt 527
Phe Cys Lys Lys His Asn Ile Glu Pro Arg Ala Pro Glu Phe Tyr Phe
160 165 170 175
get cga aag ata gat tac cta aag gac aaa ata caa cct tct ttt gtg S75
Ala Arg Lys Ile Asp Tyr Leu Lys Asp Lys Ile Gln Pro Ser Phe Val
180 185 190
aaa gaa agg cgg get atg aag agg gag tgt gaa gag ttc aaa gta cgg 623
Lys Glu Arg Arg Ala Met Lys Arg Glu Cys Glu Glu Phe Lys Val Arg
l95 200 205
atc gat gcc ctt gtt gca aaa gcg caa aaa ata cct gag gag ggc tgg 671
Ile Asp Ala Leu Val Ala Lys Ala Gln Lys Ile Pro Glu Glu Gly Trp
2l0 215 220
acc atg get gat ggc act cct tgg cct ggg aat aac cct aga gat cat 719
Thr Met Ala Asp Gly Thr Pro Trp Pro Gly Asn Asn Pro Arg Asp His
225 230 235
cca gga atg atc caa gta ttc ttg ggc cac agt ggt ggg ctt gac acg 767
Pro Gly Met Ile Gln Val Phe Leu Gly His Ser Gly Gly Leu Asp Thr
240 245 250 255
gat ggg aat gag ttg cca cgg ctt gtt tat gtt tct cgt gaa aag agg 815
Asp GIy Asn Glu Leu Pro Arg Leu Val Tyr Val Ser Arg Glu Lys Arg
260 265 270
cca ggc ttc cag cac cac aag aag get ggt gcc atg aat get ttg att 863
Pro Gly Phe Gln His His Lys Lys Ala Gly Ala Met Asn Ala Leu Ile
275 280 285
cgc gta tca get gtc ctg acg aat ggt get tat ctt ctt aat gtg gat 911
Arg Val Ser Ala Val Leu Thr Asn Gly Ala Tyr Leu Leu Asn Val Asp
290 295 300
tgt gat cac tac ttc aat agc agc aaa get ctt aga gag get atg tgt 959
Cys Asp His Tyr Phe Asn Ser Ser Lys Ala Leu Arg Glu Ala Met Cys
305 310 315
ttc atg atg gat cca gca cta gga agg aaa act tgc tat gtt cag ttt 1007
Phe Met Met Asp Pro Ala Leu Gly Arg Lys Thr Cys Tyr Val Gln Phe
320 325 330 335
cca caa aga ttt gat ggt ata gac ttg cat gat cga tat gca aac cgg 1055
2
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
Pro Gln Arg Phe Asp Gly Ile Asp Leu His Asp Arg Tyr Ala Asn Arg
340 345 350
aac att gtc ttc ttt gat att aat atg aag ggt cta gat ggc att caa 1103
Asn Ile Val Phe Phe Asp Ile Asn Met Lys Gly Leu Asp Gly Ile Gln
355 360 365
gga cct gtt tat gtg gga aca gga tgc tgt ttc aat agg cag gcc ttg 1151
Gly Pro Val Tyr Val G1y Thr G1y Cys Cys Phe Asn Arg Gln Ala Leu
370 375 380
tat ggc tat gat cct gta ttg aca gaa get gat ttg gag cct aac att 1199
Tyr Gly Tyr Asp Pro Val Leu Thr Glu Ala Asp Leu Glu Pro Asn Tle
385 390 395
atc att aaa agt tgc tgt ggc gga aga aaa aag aag gac aag agc tat 1247
Ile Ile Lys Ser Cys Cys Gly Gly Arg Lys Lys Lys Asp Lys Ser Tyr
400 405 410 415
att gat tcc aaa aac cgt gat atg aag aga aca gaa tct tcg get ccc 1295
Ile Asp Ser Lys Asn Arg Asp Met Lys Arg Thr Glu Ser Ser Ala Pro
420 425 430
atc ttc aac atg gaa gat ata gaa gag gga ttt gaa ggt tac gag gat 1343
Ile Phe Asn Met Glu Asp Ile Glu Glu Gly Phe Glu Gly Tyr Glu Asp
435 440 445
gaa agg tca ctg ctt atg tct cag aag agc ttg gag aaa cgc ttt ggc 1391
G1u Arg Ser Leu Leu Met Ser Gln Lys Ser Leu Glu Lys Arg Phe Gly
450 455 460
cag tct cca att ttt att gca tcc acc ttt atg act caa ggt ggc ata 1439
Gln Ser Pro Ile Phe Ile Ala Ser Thr Phe Met Thr Gln Gly Gly I1e
465 470 475
ccc cct tca aca aac cca ggt tcc ctg cta aag gaa get ata cat gtc 1487
Pro Pro Ser Thr Asn Pro Gly Ser Leu Leu Lys Glu Ala Ile His Val
480 485 490 495
att agt tgt gga tat gag gat aaa aca gaa tgg ggg aaa gag atc gga 1535
Ile Ser Cys Gly Tyr Glu Asp Lys Thr Glu Trp Gly Lys Glu Ile Gly
500 505 510
tgg ata tat ggc tct gtt act gaa gat att tta act ggt ttc aag atg 1583
Trp Ile Tyr Gly Ser Val Thr Glu Asp Ile Leu Thr Gly Phe Lys Met
515 520 525
cat gca aga ggt tgg ata tcc atc tac tgc atg cca ctt cgg cct tgc 1631
His Ala Arg Gly Trp Ile Ser Ile Tyr Cys Met Pro Leu Arg Pro Cys
530 535 540
ttc aag ggt tct get cca att aat ctt tct gat cgt ctc aac caa gtg 1679
Phe Lys Gly Ser Ala Pro Ile Asn Leu Ser Asp Arg Leu Asn Gln Val
545 550 555
tta cgc tgg get ctt ggt tca gtt gaa att cta ctt agc aga cac tgt 1727
Leu Arg Trp Ala Leu Gly Ser Val Glu Ile Leu Leu Ser Arg His Cys
560 565 570 575
cct atc tgg tat ggt tac aat gga agg cta aag ctt ctg gag aga ctg 1775
Pro Ile Trp Tyr Gly Tyr Asn Gly Arg Leu Lys Leu Leu Glu Arg Leu
3
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
580 585 590
gca tac atc aac acc att gtt tat cca att aca tct atc cca cta gta 1823
Ala Tyr Ile Asn Thr Ile Val Tyr Pro Ile Thr Ser Ile Pro Leu Val
595 600 605
gca tac tgc gtc ctt cct get atc tgt tta ctc acc aac aaa ttt att 1871
Ala Tyr Cys Val Leu Pro Ala Ile Cys Leu Leu Thr Asn Lys Phe Ile
610 615 620
att cct gcg att agc aat tat get ggg gcg ttc ttc atc ctg ctt ttt 1919
Ile Pro Ala Ile Ser Asn Tyr Ala Gly Ala Phe Phe Ile Leu Leu Phe
625 630 635
get tcc atc ttc gcc act ggt att ttg gag ctt cga tgg agt ggt gtt 1967
Ala Ser Ile Phe Ala Thr Gly Ile Leu Glu Leu Arg Trp Ser Gly Val
640 645 650 655
ggc att gag gat tgg tgg aga aat gag cag ttt tgg gtc att ggt ggc 2015
Gly Ile Glu Asp Trp Trp Arg Asn Glu Gln Phe Trp Val Ile Gly Gly
660 665 . 670
acc tct gca cat ctc ttt get gtg ttc caa ggt ctc tta aaa gtg cta 2063
Thr Ser Ala His Leu Phe Ala Val Phe Gln Gly Leu Leu Lys Val Leu
675 680 685
gca ggg atc gac aca aac ttc,acg gtc aca tca aag gca acc gat gat 2111
Ala Gly Tle Asp Thr Asn Phe Thr Val Thr Ser Lys Ala Thr Asp Asp
690 695 700
gat ggt gat ttt get gag ctg tat gtg ttc aag tgg aca act ctt ctg 2159
Asp Gly Asp Phe Ala Glu Leu Tyr Val Phe Lys Trp Thr Thr Leu Leu
705 710 715
atc ccc ccc acc act gtg ctt gtg att aac ctg gtt ggt ata gtg get 2207
Ile.Pro Pro Thr Thr Val Leu Val Ile Asn Leu Val Gly Ile Val Ala
720 725 730 735
gga gtg tcg tat get atc aac agt ggc tac caa tca tgg ggt cca cta 2255
Gly Val Ser Tyr Ala Ile Asn Ser Gly Tyr Gln Ser Trp Gly Pro Leu
740 745 750
ttc ggg aag ctg ttc ttt gca atc tgg gtg atc ctc cac ctc tac cct 2303
Phe Gly Lys Leu Phe Phe Ala Ile Trp Val Ile Leu His Leu Tyr Pro
755 760 765
ttc ctg aag ggt ctc atg ggg aag cag aac cgc aca ccg acc atc gtc 2351
Phe Leu Lys Gly Leu Met Gly Lys Gln Asn Arg Thr Pro Thr Ile Val
770 775 780
atc gtt tgg tcc gtc ctt ctt get tcc ata ttc tcg ctg ctg tgg gtg 2399
Ile Val Trp Ser Val Leu Leu Ala Ser Ile Phe Ser Leu Leu Trp Val
785 790 795
aag atc gac ccc ttc ata tcc cct acc cag aag get ctt toc cgt ggg 2447
Lys Ile Asp Pro Phe Ile Sex Pro Thr Gln Lys Ala Leu Ser Arg Gly
800 805 810 815
cag tgt ggt gta aac tgc tga aatgatccga actgcctgct gaataacatt 2498
Gln Cys Gly Val Asn Cys
820
4
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
gctccggcacaatcatgatctaccccttcgtgtaaataccagaggttaggcaagactttt 2558
cttggtaggtggcgaagatgtgtcgtttaagttcactctactgcatttggggtgggcagc 2618
atgaaactttgtcaacttatgtcgtgctacttatttgtagctaagtagcagtaagtagtg 2678
cctgtttcatgttgactgtcgtgactacctgttcaccgtgggctctggactgtcgtgatg 2738
taacctgtatgttggaacttcaagtactgattgagctgtttggtcaatgacattgaggga 2798
ttctctctctngaaattaanacaaantnggnt 2830
<210> 2
<211> 821
<212> PRT
<213> Zea mays
<400> 2
Pro Leu Ser Arg Ile Val Pro Ile Ser Pro Asn Glu Leu Asn Leu Tyr
I 5 10 15
Arg Ile Val Ile Val Leu Arg Leu Ile Ile Leu Cys Phe Phe Phe Gln
20 25 30
Tyr Arg Ile Thr His Pro Val GIu Asp Ala Tyr Gly Leu Trp Leu Val
35 40 45
Ser Val Ile Cys Glu Val Trp Phe Ala Leu Ser Trp Leu Leu Asp Gln
50 55 60
Phe Pro Lys Trp Tyr Pro Ile Asn Arg Glu Thr Tyr Leu Asp Arg Leu
65 70 75 80
Ala Leu Arg Tyr Asp Arg Glu Gly Glu Pro Ser Gln Leu Ala Pro Ile
SS 90 95
Asp Val Phe Val Ser Thr Val Asp Pro Leu Lys Glu Pro Pro Leu Ile
l00 105 110
Thr Gly Asn Thr Val Leu Ser Ile Leu Ala Val Asp Tyr Pro Val Asp
11S 120 l25
Lys Val Ser Cys Tyr Val Ser Asp Asp Gly Ser Ala Met Leu Thr Phe
130 135 140
Glu Ala Leu Ser Glu Thr A1a Glu Phe Ala Arg Lys Trp Val Pro Phe
145 150 155 160
Cys Lys Lys His Asn Ile GIu Pro Arg AIa Pro GIu Phe Tyr Phe Ala
165 170 175
Arg Lys Ile Asp Tyr Leu Lys Asp Lys Ile Gln Pro Ser Phe Val Lys
180 I85 190
Glu Arg Arg Ala Met Lys Arg Glu Cys Glu Glu Phe Lys Val Arg Ile
195 200 205
Asp Ala Leu Val Ala Lys Ala Gln Lys Ile Pro Glu Glu Gly Trp Thr
210 215 220
Met Ala Asp Gly Thr Pro Trp Pro Gly Asn Asn Pro Arg Asp His Pro
225 230 235 240
Gly Met Ile Gln Val Phe Leu Gly His Ser Gly Gly Leu Asp Thr Asp
245 250 255
Gly Asn Glu Leu Pro Arg Leu Val Tyr Val Ser Arg Glu Lys Arg Pro
260 265 270
Gly Phe Gln His His Lys Lys Ala Gly Ala Met Asn Ala Leu Ile Arg
275 280 285
Val Ser Ala Val Leu Thr Asn Gly Ala Tyr Leu Leu Asn Val Asp Cys
290 295 300
Asp His Tyr Phe Asn Ser Ser Lys Ala Leu Arg Glu Ala Met Cys Phe
305 310 315 320
Met Met Asp Pro Ala Leu Gly Arg Lys Thr Cys Tyr Val Gln Phe Pro
325 330 335
GIn Arg Phe Asp Gly Ile Asp Leu His Asp Arg Tyr AIa Asn Arg Asn
340 345 350
Ile VaI Phe Phe Asp Ile Asn Met Lys Gly Leu Asp Gly TIe Gln Gly
355 360 365
Pro VaI Tyr Val Gly Thr Gly Cys Cys Phe Asn Arg Gln Ala Leu Tyr
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
370 375 380
Gly Tyr Asp Pro Val Leu Thr Glu Ala Asp Leu Glu Pro Asn Ile Ile
385 390 395 400
Ile Lys Ser Cys Cys Gly Gly Arg Lys Lys Lys Asp Lys Ser Tyr Ile
405 410 415
Asp Ser Lys Asn Arg Asp Met Lys Arg Thr Glu Ser Ser Ala Pro Ile
420 425 430
Phe Asn Met Glu Asp Ile Glu Glu Gly Phe Glu Gly Tyr Glu Asp Glu
435 440 445
Arg Ser Leu Leu Met Ser GIn Lys Ser Leu Glu Lys Arg Phe Gly Gln
450 455 460
Ser Pro Ile Phe Ile Ala Ser Thr Phe Met Thr Gln Gly Gly Ile Pro
465 470 475 480
Pro Ser Thr Asn Pro Gly Ser Leu Leu Lys Glu Ala Ile His Val Ile
485 490 495
Ser Cys Gly Tyr Glu Asp Lys Thr Glu Trp Gly Lys Glu Ile Gly Trp
500 505 510
Ile Tyr Gly Ser Val Thr Glu Asp Ile Leu Thr Gly Phe Lys Met His
5l5 520 525
Ala Arg Gly Trp Ile Ser Ile Tyr Cys Met Pro Leu Arg Pro Cys Phe
530 535 540
Lys Gly Ser Ala Pro Ile Asn Leu Ser Asp Arg Leu Asn Gln Val Leu
545 550 555 560
Arg Trp Ala Leu Gly Ser Val Glu Ile Leu Leu Ser Arg His Cys Pro
565 570 575
Ile Trp Tyr Gly Tyr Asn Gly Arg Leu Lys Leu Leu Glu Arg Leu Ala
580 585 590
Tyr Ile Asn Thr Ile Val Tyr Pro Ile Thr Ser Ile Pro Leu Val Ala
595 600 605
Tyr Cys Val Leu Pro Ala Ile Cys Leu Leu Thr Asn Lys Phe Ile Ile
610 615 620
Pro Ala Ile Ser Asn Tyr Ala Gly Ala Phe Phe Ile Leu Leu Phe Ala
625 630 635 640
Ser Ile Phe Ala Thr Gly Ile Leu Glu Leu Arg Trp Ser Gly Val Gly
645 650 655
Ile Glu Asp Trp Trp Arg Asn Glu Gln Phe Trp Val Ile Gly Gly Thr
660 665 670
Ser Ala His Leu Phe Ala Val Phe Gln Gly Leu Leu Lys Val Leu Ala
675 680 685
Gly Ile Asp Thr Asn Phe Thr Val Thr Ser Lys Ala Thr Asp Asp Asp
690 695 700
Gly Asp Phe Ala Glu Leu Tyr Val Phe Lys Trp Thr Thr Leu Leu Ile
705 710 715 720
Pro Pro Thr Thr Val Leu Val Ile Asn Leu Val Gly Ile Val Ala Gly
725 730 735
Val Ser Tyr Ala Ile Asn Ser Gly Tyr G1n Ser Trp Gly Pro Leu Phe
740 745 750
Gly Lys Leu Phe Phe Ala Ile Trp Val Tle Leu His Leu Tyr Pro Phe
755 760 765
Leu Lys Gly Leu Met Gly Lys Gln Asn Arg Thr Pro Thr Ile Val Ile
770 775 780
Val Trp Ser Val Leu Leu Ala Ser Ile Phe Ser Leu Leu Trp Val Lys
785 790 795 800
Ile Asp Pro Phe Ile Ser Pro Thr Gln Lys Ala Leu Ser Arg Gly Gln
805 8l0 815
Cys Gly Val Asn Cys
820
<210> 3
<211> 25
<212> DNA
6
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
<213> Zea mays
<400> 3
cctctaagtc gcatagttcc gatat 25
<210> 4
<211> 25
<212> DNA
<213> Zea mays
<400> 4
tcagcagttt acaccacact gccca 25
<210> 5
<211> 3799
<212> DNA
<213> Zea mays
<220>
<221> CDS
<222> (238)...(3477)
<221> misc_feature
<222> (1). .(3799)
<223> n = A,T,C or G
<400> 5
caactcacgt tgccgcggct tcctccatcg gtgcggtgcc ctgtcctttt ctctcctcca 60
cctccctagt ccctcctccc ccccgcatac atagctacta ctagtagcac cacgctcgca 120
gcgggagatg cggtgctgat ccgtgcccct gctcggatct cgggagtggt gccgacttgt 180
gtcgcttcgg ctctgcctag gccagctcct tgtcggttct gggcgagctc gcctgcc atg 240
Met
1
gag ggc gac gcg gac ggc gtg aag tcg ggg agg cgc ggg gga ggg cag 288
Glu Gly Asp Ala Asp Gly Val Lys Ser Gly Arg Arg Gly Gly Gly Gln
10 15
gtg tgc cag atc tgc ggc gat ggc gtg ggc act acg gcg gag gga gac 336.
Val Cys Gln Ile Cys Gly Asp GIy Val GIy Thr Thr Ala GIu G1y Asp
20 25 30
gtc ttc acc gcc tgc gac gtc tgc ggg ttc ccg gtg tgc cgc ccc tgc 384
Val Phe Thr Ala Cys Asp Val Cys Gly Phe Pro Val Cys Arg Pro Cys
35 40 45
tac gag tac gag cgc aag gac ggc aca caa gcg tgc ccc cag tgc aaa 432
Tyr GIu Tyr Glu Arg Lys Asp Gly Thr GIn Ala Cys Pro GIn Cys Lys
50 55 60 65
aac aag tac aag cgc cac aag ggg agt cca gcg atc cga ggg gag gaa 480
Asn Lys Tyr Lys Arg His Lys Gly Ser Pro Ala Ile Arg Gly Glu Glu
70 75 80
gga gac gat act gat gcc gat gat get agc gac ttc aac tac cct gca 528
Gly Asp Asp Thr Asp Ala Asp Asp Ala Ser Asp Phe Asn Tyr Pro A1a
85 90 95
tct ggc aat gac gac cag aag cag aag att get gac agg atg cgc agc 576
Ser Gly Asn Asp Asp Gln Lys GIn Lys Tle AIa Asp Arg Met Arg Ser
100 105 1I0
7
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
tgg cgc atg aat get ggg ggc agc ggg gat gtt ggc cgc ccc aag tat 624
Trp Arg Met Asn Ala Gly Gly Ser Gly Asp Val Gly Arg Pro Lys Tyr
115 120 125
gac agt ggt gag atc ggg ctt acc aag tac gac agt ggt gag atc cct 672
Asp Ser Gly Glu Ile Gly Leu Thr Lys Tyr Asp Ser Gly G1u Ile Pro
130 135 140 145
cgg gga tac atc ccg tca gtc act aac agc cag att tcg gga gaa atc 720
Arg Gly Tyr Ile Pro Ser Val Thr Asn Ser Gln Ile Ser G1y Glu Ile
150 155 160
cct ggt get tcc cct gac cat cat atg atg tct cct act ggg aac att 768
Pro Gly Ala Ser Pro Asp His His Met Met Ser Pro Thr Gly Asn Ile
165 170 175
ggc agg cgc gcc cca ttt ccc tat atg aat cat tca tca aat ccg tcg 816
Gly Arg Arg Ala Pro Phe Pro Tyr Met Asn His Ser Ser Asn Pro Ser
180 185 190
agg gaa ttc tct ggt agc gtt ggg aat gtt gcc tgg aaa gag agg gtt 864
Arg Glu Phe Ser Gly Ser Val Gly Asn Val Ala Trp Lys Glu Arg Val
295 200 205
gat ggc tgg aaa atg aag cag gac aag gga aca att ccc atg acg aat 912
Asp Gly Trp Lys Met Lys Gln Asp Lys Gly Thr Ile Pro Met Thr Asn
210 215 220 225
ggc aca agc att get ccc tct gag ggc cgg ggt gtt ggt gat att gat 960
Gly Thr Ser Ile Ala Pro Ser Glu Gly Arg Gly Val Gly Asp Ile Asp
230 235 240
gca tca act gat tac aac atg gaa gat gcc tta tta aac gat gaa act 1008
Ala Ser Thr Asp Tyr Asn Met Glu Asp Ala Leu Leu Asn Asp Glu Thr
245 250 255
cgc cag cct cta tct agg aaa gtt cca ctt cct tcc tcc agg ata aat 1056
Arg Gln Pro Leu Ser Arg Lys Val Pro Leu Pro Ser Ser Arg Ile Asn
260 265 270
cca tac agg atg gtc att gtg cta cga ttg att gtt cta agc atc ttc 1104
Pro Tyr Arg Met Val Ile Val Leu Arg Leu Ile Val Leu Ser Ile Phe
275 280 285
ttg cac tac cgg atc aca aat cct gtg cgt aat gca tac cca ctg tgg 1152
Leu His Tyr Arg Ile Thr Asn Pro Val Arg Asn Ala Tyr Pro Leu Trp
290 295 300 305
ctt cta tct gtt ata tgt gag atc tgg ttt get ctt tcc tgg ata ttg 1200
Leu Leu Ser Val Ile Cys Glu Ile Trp Phe Ala Leu Ser Trp Ile Leu
310 315 320
gat cag ttt cca aag tgg ttt cca atc aac cgc gag act tac ctt gat 1248
Asp Gln Phe Pro Lys Trp Phe Pro Ile Asn Arg Glu Thr Tyr Leu Asp
325 330 335
aga ctc gca tta agg tat gac cgg gaa ggt gag cca tct cag ttg get 1296
Arg Leu Ala Leu Arg Tyr Asp Arg Glu Gly Glu Pro Ser Gln Leu Ala
340 345 350
8
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
get gtt gac att ttt gte agt act gtc gac cca atg aag gag cct cct 1344
Ala Val Asp Ile Phe Val Ser Thr Val Asp Pro Met Lys Glu Pro Pro
355 360 365
ctt gte act gec aat acc gtg cta tcc att ctc get gtg gac tat cet 1392
Leu Val Thr AIa Asn Thr Val Leu Ser Ile Leu Ala Val Asp Tyr Pro
370 . 375 380 385
gtg gat aag gtc tct tgc tat gta tct gat gat gga get get atg ctg 1440
Val Asp Lys Val Ser Cys Tyr Val Ser Asp Asp Gly Ala Ala Met Leu
390 395 400
aca ttt gat gca cta get gag act tca gag ttt get aga aaa tgg gtg 1488
Thr Phe Asp Ala Leu Ala Glu Thr Ser Glu Phe Ala Arg Lys Trp Val
405 410 415
cca ttt gtt aag aag tac aac att gaa cct aga get cct gaa tgg tac 1536
Pro Phe Val Lys Lys Tyr Asn Ile Glu Pro Arg Ala Pro Glu Trp Tyr
420 425 430
ttc tcc cag aaa att gat tac ttg aag gac aaa gtg cac cct,tca ttt 1584
Phe Ser Gln Lys Ile Asp Tyr Leu Lys Asp Lys Val His Pro Ser Phe
435 440 445
gtt aaa gac cgc cgg gcc atg aag aga gaa tat gaa gaa ttc aaa att 1632
Val Lys Asp Arg Arg Ala Met Lys Arg Glu Tyr Glu Glu Phe Lys Ile
450 455 460 465
agg gta aat ggc ctt gtt get aag gca caa aaa gtc cct gag gaa gga 1680
Arg Val Asn Gly Leu Val Ala Lys Ala Gln Lys Val Pro Glu Glu Gly
470 475 480
tgg atc atg caa gat ggc aca cca tgg cca gga aac aat acc agg gac 1728
Trp Ile Met Gln Asp Gly Thr Pro Trp Pro Gly Asn Asn Thr Arg Asp
485 490 495
cat cct gga atg att cag gtt ttc ctt ggt cac agt ggt ggt ctt gat 1776
His Pro Gly Met Ile Gln Val Phe Leu Gly His Ser Gly Gly Leu Asp
500 505 510
act gag ggt aat gag cta ccc cgt ttg gtc tat gtt tct cgt gaa aaa 1824
Thr Glu Gly Asn Glu Leu Pro Arg Leu Val Tyr Val Ser Arg Glu Lys
515 520 525
cgt cct gga ttc cag cat cac aag aaa get ggt gcc atg aat get ctt 1872
Arg Pro Gly Phe Gln His His Lys Lys Ala Gly Ala Met Asn Ala Leu
530 535 540 545
gtc cgc gtc tca get gtg ctt acc aat gga caa tac atg ttg aat ctt 1920
Val Arg Val Ser Ala Val Leu Thr Asn Gly Gln Tyr Met Leu Asn Leu
550 555 560
gat tgt gat cac tac atc aac aac agt aag get ctc agg gaa get atg 1968
Asp Cys Asp His Tyr Ile Asn Asn Ser Lys Ala Leu Arg Glu A1a Met
565 570 575
tgc ttc ctt atg gat cct aac cta gga agg agt gtc tgc tat gtt cag 2016
Cys Phe Leu Met Asp Pro Asn Leu Gly Arg Ser Val Cys Tyr Val Gln
580 585 590
ttt ccc cag agg ttc gat ggt att gat agg aat gat cga tat gcc aac 2064
9
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
Phe Pro Gln Arg Phe Asp Gly Ile Asp Arg Asn Asp Arg Tyr Ala Asn
595 600 605
agg aac acc gtg ttt ttc gat att aac ttg aga ggt ctt gat ggc atc 2112
Arg Asn Thr Val Phe Phe Asp Ile Asn Leu Arg Gly Leu Asp Gly Ile
610 615 620 625
caa gga cca gtt tat gtg ggc act ggc tgt gtt ttc aac aga aca get 2160
Gln Gly Pro Val Tyr Val Gly Thr Gly Cys Val Phe Asn Arg Thr Ala
630 635 640
cta tat ggt tat gag ccc cca att aag caa aag aag ggt ggt ttc ttg 2208
Leu Tyr Gly Tyr Glu Pro Pro Ile Lys Gln Lys Lys Gly Gly Phe Leu
645 650 655
tca tca cta tgt ggt ggc agg aag aag gga°agc aaa tca aag aag ggc 2256
Ser Ser Leu Cys Gly Gly Arg Lys Lys Gly Ser Lys Ser Lys Lys Gly
660 665 670
tca gac aag aaa aag tca cag aag cat gtg gac agt tct gtg cca gta 2304
Ser Asp Lys Lys Lys Ser Gln Lys His Val Asp Ser Ser Val Pro Val
675 680 685
ttc aat ctt gaa gat ata gag gag gga gtt gaa ggc get gga ttt gat 2352
Phe Asn Leu Glu Asp Ile Glu Glu Gly Val Glu Gly Ala Gly Phe Asp
690 695 700 705
gat gag aaa tca ctt ctt atg tct caa atg agc ttg gag aag aga ttt 2400
Asp Glu Lys Ser Leu Leu Met Ser Gln Met Ser Leu Glu Lys Arg Phe
710 715 720
ggc caa tct gca get ttt gtt gcg tcc act ctg atg gaa tat ggt ggt 2448
Gly Gln Ser Ala Ala Phe Val Ala Ser Thr Leu Met Glu Tyr Gly Gly
725 730 735
gtt cct cag tct,gcg act cca gaa tct ctt ctg aaa gaa get atc cat 2496
Val Pro Gln Ser Ala Thr Pro Glu Ser Leu Leu Lys Glu Ala Ile His
740 745 750
gtc ata agt tgt ggc tac gag gac aag att gaa tgg gga act gag att 2544
Val Ile Ser Cys Gly Tyr Glu Asp Lys Ile Glu Trp Gly Thr Glu Ile
755 760 765
ggg tgg atc tat ggt tct gtg acg gaa gat att ctc act ggg ttc aag 2592
Gly Trp Ile Tyr Gly Ser Val Thr Glu Asp Ile Leu Thr Gly Phe Lys
770 775 780 785
atg cac gca cga ggc tgg cgg tcg atc tac tgc atg cct aag cgg ccg 2640
Met His Ala Arg Gly Trp Arg Ser Ile Tyr Cys Met Pro Lys Arg Pro
790 795 800
gcc ttc aag gga tcg get ccc atc aat ctc tca gac cgt ctg aac cag 2688
Ala Phe Lys Gly Ser Ala Pro Ile Asn Leu Ser Asp Arg Leu Asn Gln
805 810 815
gtg ctc cgg tgg get ctc ggt tca gtg gaa atc ctt ttc agc cgg cat 2736
Val Leu Arg Trp Ala Leu Gly Ser Val Glu Ile Leu Phe Ser Arg His
820 825 830
tgc ccc cta tgg tac ggg tac gga gga cgc ctg aag ttc ttg gag aga 2784
Cys Pro Leu Trp Tyr Gly Tyr Gly Gly Arg Leu Lys Phe Leu Glu Arg
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
835 840 845
ttc gcc tac atc aac acc acc atc tac ccg ctc acg tcc ctc ccg ctc 2832
Phe Ala Tyr Ile Asn Thr Thr Ile Tyr Pro Leu Thr Ser Leu Pro Leu
850 855 860 865
ctc att tac tgt atc ctg cct gcc atc tgc ctg ctc acg ggg aag ttc 2880
Leu Ile Tyr Cys Ile Leu Pro Ala Ile Cys Leu Leu Thr Gly Lys Phe
870 875 880
atc atc cca gag atc agc aac ttc get agt atc tgg ttc atc tct ctc 2928
Ile Ile Pro Glu Ile Ser Asn Phe Ala Ser Ile Trp Phe Ile Ser Leu
885 890 895
ttc atc tcg atc ttc gcc acg ggt atc ctg gag atg agg tgg agc ggc 2976
Phe Ile Ser Ile Phe Ala Thr Gly Ile Leu Glu Met Arg Trp Ser Gly
900 905 910
gtg ggc atc gac gag tgg tgg agg aac gag cag ttc tgg gtc atc gga 3024
Val Gly I1e Asp Glu Trp Trp Arg Asn Glu Gln Phe Trp Val Ile Gly
915 920 925
ggc atC tCC gCC CaC CtC ttC gCC gtC ttC cag ggc CtC Ctc aag gtg 3072
Gly Ile Ser Ala His Leu Phe Ala Val Phe Gln Gly Leu Leu Lys Val
930 935 940 945
Ctt gCC ggc atc gac acc aac ttC aCC gtc aCC tcc aag gcc tcg gat 3120
Leu Ala Gly Tle Asp Thr Asn Phe Thr Val Thr Ser Lys AIa Ser Asp
950 955 960
gaa gac ggc gac ttc gcg gag ctg tac atg ttc aag tgg acg aca ctt 3168
G1u Asp Gly Asp Phe Ala Glu Leu Tyr Met Phe Lys Trp Thr Thr Leu
965 970 975
ctg atc ccg ccc acc acc atc ctg atc atc aac ctg gtc ggc gtt gtt 3216
Leu Ile Pro Pro Thr Thr Ile Leu Ile Ile Asn Leu Val Gly Val Val
980 985 990
gcc ggc atc tcc tac gcc atc aac agc ggg tac cag tcg tgg ggt ccg 3264
Ala Gly Ile Ser Tyr Ala I1e Asn Ser Gly Tyr Gln Ser Trp Gly Pro
995 1000 1005
ctc ttc ggc aag ctc ttc ttc gcc ttc tgg gtg atc gtt cac ctg tac 3312
Leu Phe Gly Lys Leu Phe Phe Ala Phe Trp Val Ile Va1 His Leu Tyr
1010 1015 1020 1025
ccg ttc ctc aag ggt ctc atg ggt cgg cag aac cgc acc ccg acc atc 3360
Pro Phe Leu Lys Gly Leu Met Gly Arg Gln Asn Arg Thr Pro Thr Ile
1030 1035 1040
gtg gtt gtc tgg gcg atc ctg ctg gcg tcg atc ttc tcc ttg ctg tgg 3405
Val Val Val Trp Ala Ile Leu Leu Ala Ser Ile Phe Ser Leu Leu Trp
1045 1050 1055
gtt cgc atc gat CCg ttC aCC aaC CgC gtc act ggc ccg gat act cga 3456
Val Arg Ile Asp Pro Phe Thr Asn Arg Val Thr Gly Pro Asp Thr Arg
1060 1065 1070
acg tgt ggc atc aac tgc tag ggaggtggaa ggtttgtaga aacagagaga 3507
Thr Cys Gly Ile Asn Cys
1075
11
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
taccacgaatgtgccgctgccacaaattgtctgttagtaagttatataggcaggtggcgt3567
tatttacagctacgtacacacaaggggatactccgtttatcactggtgtgcattcttttg3627
ttgatataagttactatatatacgtattgcttctactttgtggagagtggctgacaggac3687
cagttttgtaatgttatgaacagcaaagaaataagttagtttccaaaaaaaaaaaaaaaa3747
aaaaaaaaanaaaaaaaaaaaaaaaaananaaaanaaaaaaaaaaaaacccc 3799
<210> 6
<211> 1079
<212> PRT
<2l3> Zea mat's
<400> 6
Met Glu Gly Asp Ala Asp Gly Val Lys Ser Gly Arg Arg Gly Gly Gly
1 5 10 15
Gln Val Cys Gln Ile Cys Gly Asp Gly Val Gly Thr Thr Ala Glu Gly
20 25 30
Asp Val Phe Thr Ala Cys Asp Val Cys Gly Phe Pro Val Cys Arg Pro
35 40 45
Cys Tyr Glu Tyr Glu Arg Lys Asp Gly Thr Gln Ala Cys Pro Gln Cys
50 55 60
Lt's Asn Lys Tyr Lys Arg His Lys Gly Ser Pro Ala Ile Arg Gly Glu
65 70 75 80
Glu Gly Asp Asp Thr Asp Ala Asp Asp Ala Ser Asp Phe Asn Tyr Pro
85 90 95
Ala Ser Gly Asn Asp Asp Gln Lys Gln Lys Ile Ala Asp Arg Met Arg
100 105 110
Ser Trp Arg Met Asn Ala Gly Gly Ser Gly Asp Val Gly Arg Pro Lys
115 120 125
Tyr Asp Ser Gly Glu Ile G1y Leu Thr Lys Tyr Asp Ser Gly Glu I1e
130 135 140
Pro Arg Gly Tyr Ile Pro Ser Val Thr Asn Ser Gln Tle Ser G1y Glu
145 150 155 160
Ile Pro Gly Ala Ser Pro Asp His His Met Met Ser Pro Thr Gly Asn
165 170 175
Ile Gly Arg Arg Ala Pro Phe Pro Tyr Met Asn His Ser Ser Asn Pro
180 185 190
Ser Arg Glu Phe Ser Gly Ser Val Gly Asn Val Ala Trp Lys Glu Arg
195 200 205
Val Asp Gly Trp Lys Met Lys Gln Asp Lys Gly Thr Ile Pro Met Thr
210 2I5 220
Asn Gly Thr Ser Ile Ala Pro Ser Glu Gly Arg Gly Val Gly Asp Ile
225 230 235 240
Asp Ala Ser Thr Asp Tyr Asn Met Glu Asp Ala Leu Leu Asn Asp Glu
245 250 255
Thr Arg Gln Pro Leu Ser Arg Lys Val Pro Leu Pro Ser Ser Arg Ile
260 265 270
Asn Pro Tyr Arg Met Va1 Ile Va1 Leu Arg Leu Ile Val Leu Ser Ile
275 280 285
Phe Leu His Tyr Arg Tle Thr Asn Pro Val Arg Asn Ala Tyr Pro Leu
290 295 300
Trp Leu Leu Ser Val Ile Cys Glu Ile Trp Phe Ala Leu Ser Trp Ile
305 310 315 320
Leu Asp Gln Phe Pro Lys Trp Phe Pro Ile Asn Arg Glu Thr Tyr Leu
325 330 335
Asp Arg Leu Ala Leu Arg Tyr Asp Arg Glu Gly Glu Pro Ser Gln Leu
340 345 350
Ala Ala Val Asp Ile Phe Val Ser Thr Val Asp Pro Met Lys Glu Pro
355 360 365
Pro Leu Val Thr Ala Asn Thr Val Leu Ser Ile Leu Ala Val Asp Tyr
370 375 380
12,
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
Pro Val Asp Lys Val Ser Cys Tyr Val Ser Asp Asp Gly Ala A1a Met
385 390 395 400
Leu Thr Phe Asp Ala Leu Ala Glu Thr Ser Glu Phe Ala Arg Lys Trp
405 410 415
Val Pro Phe Val Lys Lys Tyr Asn Ile Glu Pro Arg Ala Pro Glu Trp
420 425 430
Tyr Phe Ser Gln Lys Ile Asp Tyr Leu Lys Asp Lys Val His Pro Ser
435 440 445
Phe Val Lys Asp Arg Arg Ala Met Lys Arg Glu Tyr Glu Glu Phe Lys
450 455 460
Ile Arg Val Asn Gly Leu Val Ala Lys Ala Gln Lys Val Pro Glu Glu
465 470 475 480
Gly Trp Ile Met Gln Asp Gly Thr Pro Trp Pro Gly Asn Asn Thr Arg
485 490 495
Asp His Pro Gly Met Ile Gln Val Phe Leu Gly His Ser Gly Gly Leu
500 505 5I0
Asp Thr Glu Gly Asn Glu Leu Pro Arg Leu Val Tyr Val Ser Arg Glu
515 520 525
Lys Arg Pro Gly Phe Gln His His Lys Lys Ala Gly Ala Met Asn Ala
530 535 540
Leu Val Arg Val Ser Ala Val Leu Thr Asn Gly Gln Tyr Met Leu Asn
545 550 555 560
Leu Asp Cys Asp His Tyr Ile Asn Asn Ser Lys Ala Leu Arg Glu Ala
565 570 575
Met Cys Phe Leu Met Asp Pro Asn Leu Gly Arg Ser Val Cys Tyr Val
580 585 590
Gln Phe Pro Gln Arg Phe Asp Gly Ile Asp Arg Asn Asp Arg Tyr Ala
595 600 605
Asn Arg Asn Thr Val Phe Phe Asp Ile Asn Leu Arg Gly Leu Asp Gly
610 615 620
Ile Gln Gly Pro Val Tyr Val Gly Thr Gly Cys Val Phe Asn Arg Thr
625 630 635 640
Ala Leu Tyr Gly Tyr Glu Pro Pro Ile Lys Gln Lys Lys Gly Gly Phe
645 650 655
Leu Ser Ser Leu Cys Gly Gly Arg Lys Lys Gly Ser Lys Ser Lys Lys
660 665 670
Gly Ser Asp Lys Lys Lys Ser Gln Lys His Val Asp Ser Ser Va1 Pro
675 680 685
Val Phe Asn Leu Glu Asp Ile Glu Glu Gly Val Glu Gly Ala Gly Phe
690 695 700
Asp Asp Glu Lys Ser Leu Leu Met Ser Gln Met Ser Leu Glu Lys Arg
705 710 715 720
Phe Gly Gln Ser Ala Ala Phe Val Ala Ser Thr Leu Met Glu Tyr Gly
725 730 735
Gly Val Pro Gln Ser Ala Thr Pro Glu Ser Leu Leu Lys Glu Ala Ile
740 745 750
His Val Ile Ser Cys Gly Tyr Glu Asp Lys Ile Glu Trp Gly Thr Glu
755 760 765
Ile Gly Trp Ile Tyr Gly Ser Val Thr Glu Asp Ile Leu Thr Gly Phe
770 775 780
Lys Met His Ala Arg Gly Trp Arg Ser Ile Tyr Cys Met Pro Lys Arg
785 790 795 800
Pro Ala Phe Lys Gly Ser Ala Pro Tle Asn Leu Ser Asp Arg Leu Asn
805 810 815
Gln Val Leu Arg Trp Ala Leu Gly Ser Val Glu Ile Leu Phe Ser Arg
820 825 830
His Cys Pro Leu Trp Tyr G1y Tyr Gly Gly Arg Leu Lys Phe Leu Glu
835 840 845
Arg Phe Ala Tyr Ile Asn Thr Thr Ile Tyr Pro Leu Thr Ser Leu Pro
850 855 860
Leu Leu Ile Tyr Cys Ile Leu Pro Ala Ile Cys Leu Leu Thr Gly Lys
13
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
865 870 875 880
Phe Ile Ile Pro Glu Tle Ser Asn Phe Ala Ser Ile Trp Phe Ile Ser
885 890 895
Leu Phe Ile Ser Ile Phe Ala Thr Gly Ile Leu Glu Met Arg Trp Ser
900 905 910
Gly Val Gly Ile Asp Glu Trp Trp Arg Asn Glu Gln Phe Trp Val Ile
915 920 925
Gly Gly I1e Ser Ala His Leu Phe Ala Val Phe Gln Gly Leu Leu Lys
930 935 940
Val Leu Ala Gly Ile Asp Thr Asn Phe Thr Val Thr Ser Lys Ala Ser
945 950 955 960
Asp Glu Asp Gly Asp Phe Ala Glu Leu Tyr Met Phe Lys Trp Thr Thr
965 970 975
Leu Leu Ile Pro Pro Thr Thr Ile Leu Ile Ile Asn Leu Val Gly Val
980 985 990
Val Ala Gly Ile Ser Tyr Ala Ile Asn Ser Gly Tyr Gln Ser Trp Gly
995 1000 1005
Pro Leu Phe Gly Lys Leu Phe Phe Ala Phe Trp Val Tle Val His Leu
1010 1015 1020
Tyr Pro Phe Leu Lys Gly Leu Met GIy Arg Gln Asn Arg Thr Pro Thr
1025 1030 1035 1040
Tle Val Val Val Trp Ala Ile Leu Leu Ala Ser Ile Phe Ser Leu Leu
1045 1050 1055
Trp Val Arg Ile Asp Pro Phe Thr Asn Arg Val Thr Gly Pro Asp Thr
1060 1065 1070
Arg Thr Cys Gly Ile Asn Cys
1075
<210> 7
<211> 25
<212> DNA
<2l3> Zea mat's
<400> 7
atggagggcg acgcggacgg cgtga 25
<210> 8
<211> 25
<212> DNA
<213> Zea mat's
<400> 8
ctagcagttg atgccacacg ttcga 25
<2I0> 9
<211> 36
<212> DNA
<213> Artificial Sequence
<220>
<223> Designed oligonucleotide based upon the adapter
sequence and poly T to remove clones which have a
poly A tail but no cDNA. .
<400> 9
tcgacccacg cgtccgaaaa aaaaaaaaaa aaaaaa 36
<210> 10
<211> 27
<212> DNA
<213> Zea mat's
14
CA 02406381 2002-10-11
WO 01/79516 PCT/USO1/11951
<400> l0'
tgctgatatc gagaaggccg gaatcgt 27
<210> l1
<211> 21
<212> DNA
<213> Zea mat's
<400> Z1
ctccccacca gacccttgag g 21
<2l0> 12
<211> 32
<212> DNA
<213> Zea mat's
<400> 12
agagaagcca acgccawcgc ctcyatttcg tc 32