Note: Descriptions are shown in the official language in which they were submitted.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
HUMAN SIT4 ASSOCIATED PROTEINS LIKE (SAPL) PROTEINS AND ENCODING GENES; USES
THEREOF
The present invention relates to nucleic acids, polypeptides,
oligonucleotide probes and primers, methods of diagnosis or
prognosis, and other methods relating to and based on the
5cloning and characterisation of a gene which the present
inventors have termed SAPL, found in at least two isoforms
terms SAPLa and SAPLb.
Mammals and yeast use a similar mechanism that relies upon
locyclins and cyclin-dependent kinases (CDKs) to regulai.e the
cell cycle. In yeast the components of this mechanism 1:~~°lude
the cyclins, CLN1 and CLN2, and the cyclin-dependent k~.nases
CDC28 (cell division control) in yeast (Dynlacht, 199?. Nature
389, 149-152). The activity of the serine/threor_ine kinaae
~5CDC23, also known as CDK1, is essential for the completion of
G1 START, the controlling event in the yeast cell cycle. CDC28
activity is modulated by the level of the cycli.ns, CLrdl and
CLN2. 'Ihe i2ve1 of expression of CDC28 remains relative ;'
constant throughout the cell cycle. In contrast, the mRNA
2oexpression level of the genes CLNI and CLN2 increases
dramatically during late G1. This expression of CLNI and CLN2
is dependent upon the SIT4 Ppase (protein phosphatase)
(Fernandez-Sarabia et al. 1992. Genes Dev. 6, 2417-2428). The
SIT4 Ppase is a type 2A phosphatase which is encoded by the
25 sit4 (~orulation-induced transcript 4)gene (Sutton et al.
1991. Mol. Cell. Biol. 11, 2133-2148). The SIT4 protein is
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
2
55o identical to the catalytic subunit of mammalian type 2A
phosphatase and 40o identical to mammalian type 1 phosphatase.
A human cDNA clone, protein phosphatase 6, has been obtained
that encodes a protein that when expressed in yeast has the
5ability to complement a sit4 mutant (Bastians and Ponstingl,
1996. J. Cell Sci. 109, 2865-2874). Therefore it is likely
that protein phosphatase 6 or a related phosphatase is the
mammalian ortholog of SIT4.
lOGenetic analysis in yeast of sit4 mutations demonstrated that
the SIT4 Ppase is necessary for progression of the cell cycle
from late G1 to S phase with a temporal point of action, or
execution point, at or similar to that of CDC28 (Sutton et al.
1991. Mol. Cell. Biol. 11, 2133-2148). The SIT4 protein was
l5found to be associated with two proteins with an apparent
molecular weight of 190 and 155 kD. The cloning of the genes
encoding the SIT4 Associated Proteins, SAP155 and SAP190,
resulted in the identification of two additional related genes
encoding the proteins SAP185 and SAP4 (Luke et al., 1996. Mol.
2oCell. Biol. 16, 2744-2755). Alignment of the members-of the
SAP family revealed a number of conserved residues (Luke et
al., 1996. Mol. Cell. Biol. 16, 2744-2755), some of which are
also present in SAPL. The SAP proteins appear to specifically
interact with the SIT4 phosphatase (Luke et al., 1996. Mol.
25Ce11. Biol. 16, 2744-2755). Deletion of all four SAP genes
results in a phenotype that is equivalent to a deletion of
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
3
sit4, thus their association with SIT4 is essential for its
function (Luke et al., 1996. Mol. Cell. Biol. 16, 2749-2755).
Since overexpression of SAP genes can suppress certain sit4
temperature sensitive mutants it is thought that the SAP
5proteins act as positive modulators of SIT4 Ppase. The
mechanism by which the SAP proteins modulate SIT4 activity is
unknown. One possibility that has been suggested is that the
SAP proteins increase the substrate specificity of SIT4 Ppase,
in a fashion analogous to that found for the glycogen-
lotargeting subunit of type 1 phosphatases (Luke et al., 1996.
Mol. Cell. Biol. 16, 2744-2755). In this case SIT4 would be a
SAP-dependent phosphatase in a manner similar to CDC28 being a
cyclin-dependent kinase (Luke et al., 1996. Mol. Cell. Biol.
16, 2744-2755). Regardless of the mode of action the
l5importance of the SAP proteins in the yeast cell cycle in
regulating the activity of a critical enzyme, SIT4 Ppase, is
well established.
The present inventors now disclose for the first time two
2oisoforms of a novel gene, arising from alternative splicing
and encoding highly related proteins, from the IDDM9 locus on
human chromosome 11q13. The isoforms have been termed by the
inventors "SAPLa" (SAP like) and "SAPLb".
25 BRIEF DESCRIPTION OF THE FIGURES
Figure 1(a) shows the nucleotide sequence of SAPLa cDNA.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
4
Nucleotide numbering herein is by reference to this sequence.
Figure 1(b) shows the longest open reading frame of the SAPLa
cDNA.
Figure 1(c) shows the amino acid sequence translation of the
open reading frame in the SAPLa cDNA producing SAPL isoform a.
Amino acid residue numbering herein is by reference to this
sequence.
Figure 1(d) shows the amino acid sequence translation of an
alternative open reading frame in the SAPLa cDNA which starts
with a sequence that conforms with the Kozak consensus
sequence for efficient initiation of translation.
Figure 2(a) shows the nucleotide sequence of SAPLb cDNA.
Figure 2(b) shows the longest open reading frame of the SAPLb
cDNA.
Figure 2(c) shows the amino acid sequence translation of the
open reading frame in the SAPLb cDNA producing SAPL isoform b.
Figure 3 shows a multiple sequence alignment of the amino acid
25sequence of yeast SAP190, yeast SAP185 and human SAPL isoform
a. The consensus sequence of the alignment is shown, capital
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
letters indicate identity at that position. The amino acid
residues that are underlined are conserved within the yeast
SAP family as well as SAPL.
5 Figure 4 shows the sequence of DNA found immediately adjacent
to SAPL exon 1 in the genome and identified as a putative
promoter. Sequences that match the consensus binding sites
for the Spl and NF kappa B transcription factors are shown in
capital letters. Sequences that are conserved in the syntenic
loregion of mouse genomic DNA sequence are underlined.
Figure 5 shows a multipoint linkage curve of the IDDM4 region.
Figure 6 shows the LOD score of the Tsp value obtained in
l5analysis described below. The x-axis is not to scale.
Characteristics of SAPL cDNA and SAPL protein
Two full length cDNA sequences, that arise from alternative
splicing, were isolated from the IDDM4 locus on chromosome
2o11q13 and termed SAPLa (SAP like) and SAPLb. The SAPL.gene is
also known to the inventors as DM4E4. The longest cDNA of 4793
nucleotides (Figure 1(a)) contains an open reading frame
(Figure 1.(b)) that encodes a protein, SAPLa, of 793 amino
acids (Figure 1(c)). The putative initiator methionine codon
25at nucleotide 278, AGCATGT conforms to the Kozak consensus
sequence for efficient initiation of translation at the -3
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
6
(purine, preferably A) but not the +4 position (Kozak, (1996)
Mamm. Genome 7, 563-574). The predicted molecular weight of
this protein is 89 kdal with an isoelectric point of 4.31.
The protein does not contain any stretches of hydrophobic
5amino acids that would have a high probability of serving as a
transmembrane spanning domain, nor does the protein contain a
signal peptide for protein export. Therefore SAPLa is most
likely localized to the cytoplasmic portion of the cell. A
nonoptimal initiation can lead to multiple start sites (Kozak,
(1996) Ma mm. Genome 7, 563-574). The first ATG codon that
conforms to the Kozak consensus sequence is at nucleotide 482,
GACATGG, this resulting in a protein of 725 amino acids
(Figure 1(d)). The SAPLa cDNA contains consensus signals for
polyadenylation at nucleotides 3592-3597 and 4115 to 4120.
The second cDNA (Figure 2(a)), SAPLb, of 3228 nucleotides
contains an open reading frame (Figure 2(b)) that encodes a
protein (Figure 2(c)), SAPLB, of 791 amino acids. The two
proteins, SAPLa and SAPLb, are 1000 identical for the first
20776 amino acids. SAPLb has a predicted molecular weight of 89
kdal with an isoelectric point of 4.30, like SAPLa, it is
predicted to be expressed in the cytoplasm. Both SAPLa and
SAPLb contain a tandem repeat of the amino acid sequence Ser-
Thr-Asp-Ser-Glu-Glu (STDSEE) from amino acids 562-573.
Comparison of SAPL with the protein database using the Smith-
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
7
Waterman algorithm reveals a significant degree of similarity,
p = 6.01e-13 and p = 2.02e-12, to two members of a SAP family
of yeast proteins, SAP190 and SAP185 respectively. A lesser
degree of similarity, p = 2.05e-2, is found to a third member
of this family, SAP155. The amino acid sequence identity
between amino acid 94-724 of SAPL and SAP190 is 190. Over a
similar region SAPL is 18o identical to SAP185. Using the
algorithm tFASTA (which translates all the nucleotide
sequences in the database in the 6 possible frames and
locompares it with the amino acid sequence of the input protein
sequence using the FASTA algorithm) to search for additional
mammalian genes with similarity to SAPL resulted in the
identification of EST sequences but no full length cDNA
sequences. Therefore the full length SAPL cDNA identified in
this application provides for the first time the determination
of amino acid sequence of a mammalian homolog of the yeast SAP
family..
A multiple sequence alignment of SAPL, SAP190, and SAP 185
2ousing the program GCG program pileup with the GapWeight set at
10 and the GapLengthWeight set at 1 yields the alignment shown
in Figure 3. This alignment reveals several conserved motifs,
a number of which are conserved within four members of the SAP
family (SAP4, SAP155, SAP185 and SAP190). The most strikingly
conserved motifs are located in SAPL at residues 333-338,
WNNFLH, and from 403-414, R(x)GYMGHLT(xx)A. There are also a
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
8
number of other conserved regions of note from 102 to 108,
LL (x) (K/R) L (aromatic) S and from 163 to 168,
MD(hydrophobic)LL(K/R). Although the SAPL STDSEE repeats are
not conserved there are a number of conserved acidic residues
Sin this portion of the protein, i.e. residues 539 to 591.
Several of these conserved motifs are found in all members of
the SAP family, not only SAP190 and SAP185 which are the most
similar to SAPL (Figure 3). These include portions of the
previously noted motifs; the motif from 333-338
loWNNF(hydrophobic)H and the motif from 403-414 GYMG. A number
of other residues which are identical between human SAPL and
the members of the SAP family are indicated in Figure 3. The
SAP proteins are not that similar to each other, e.g. SAP185
exhibits only 14o and 42o identity to SAP155 and SAP190,
l5respectively. The finding that the protein contains motifs
that are conserved within this family provides a strong
indication that it is related to the yeast SAP family.
A number of potential protein phosphorylation sites are found
2oin the SAPL protein (Table 4). These include sites for the
cAMP dependent protein kinase, protein kinase C, and casein
kinase 2. Protein phosphorylation is a reversible
modification of proteins and an important mechanism for
modulating protein function. Furthermore in yeast the
25deletion of the sits gene results in hyperphosphorylation of
the SAP proteins (Luke et al., 1996. Mol. Cell. Biol. 16,
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
9
2744-2755), therefore there is direct evidence that this
family of proteins is subject to protein phosphorylation.
Thus it is likely that SAPL is phosphorylated at least at some
of the sites listed in Table 4. Furthermore, it is likely
that SAPL function is modulated by protein phosphorylation.
Protein phosphorylation of SAPL may be used in assays for
compounds that modulate the level of SAPL phosphorylation.
These compounds may inhibit either kinases or phosphatases
that act on SAPL. Compounds isolated in such a fashion may
l0have therapeutic utility in modifying the function of SAPL.
The cloning of the SAPL cDNA permits overexpression of the
SAPL protein, and various isoforms, and testing of ability to
complement SAP mutants in yeast. Similarly, expression of the
human SAPL in yeast allows for the testing of a physical
association between SAPL and SIT4. The cloning of the SAPL
cDNA also allows the testing of the ability of SAPL to
interact and modulate the activity of human protein
phosphatase 6 and related phosphatases. Usefulness of SAPL in
2oscreening for molecules of pharmaceutical potential is
discussed further below.
Since the activity of phosphatases, such as a SIT4 ortholog,
may be necessary for progression of the cell cycle, compounds
25that inhibit the activity of the phosphatase may be useful in
the treatment of cancer and other proliferative disorders.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
The SAPL protein may act in a manner analogous to the SAP
proteins in yeast either to activate the phosphatase or modify
its specificity. This too inicates usefulness of SAPL in
assays for compounds that modulate the activity of the
5phosphatase, e.g. inhibit it.
There is evidence that the cyclin/CDK system is used to
monitor environmental factors that influence not only cell
division but apoptosis in terminally differentiated cells (Gao
l0and Zelenka, (1997). BioEssays 19, 307-315)., Since certain
cyclins are expressed in T-cells this mechanism may be
important in mediating T-cell apoptosis. Apoptosis of
selected T-cell populations is a critical element in the
control of the immune system and the prevention of
l5autoimmunity. Therefore the location of SAPL within the IDDM4
locus and its proposed biological function of modulating
either the activity or specificity of a phosphatase may
indicate that this protein is important in maintaining immune
self-tolerance. Compounds that modify the activity of SAPL
2omay be tested in assays of T-cell proliferation or apoptosis.
Compounds able to modify SAPL activity may be identified by
ability to stimulate or inhibit SAPL complementation in mutant
yeast deleted for all four yeast SAP genes (Luke et al. (1996)
Mol. Cell. Biol. 16: 2744-2755).
The presence of polymorphisms within the SAPL gene and the
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
11
location of this gene within the IDDM4 locus allow for use of
certain of the polymorphisms as diagnostic markers. These
polymorphisms may be used to assay for the presence of a
chromosomal region that confers susceptibility to type 1
diabetes. This susceptibility may be due to functional
polymorphisms within the SAPL gene itself or may be due to a
functional polymorphism within a neighboring gene that is in
linkage disequilibrium with a SAPL polymorphism.
According to one aspect of the present invention there is
provided a nucleic acid molecule encoding a SAPL polypeptide,
which may be any of the SAPLa polypeptide isoforms of which
the amino acid sequences are shown in Figure 1(c) and Figure
151(d) and the SAPLb polypeptide isoform of which the amino acid
sequence is shown in Figure 2(c).
Thus, individual aspects of the present invention provide
nucleic acid encoding a polypeptide including the amino acid
sequence shown in Figure 1 (c) , Figure 1 (d) or Figure 2.(c) ..'
Furthermore, an additional aspect of the present invention
provides nucleic acid encoding a polypeptide which includes
the first 776 amino acids shown in Figure 1(c) and Figure 2(c)
which are identical for the respective SAPL isoforms a and b.
A coding sequence of the present invention may be that shown
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
12
included in Figure 1(a), Figure 1(b), Figure 2(a) or Figure
2(b), or it may be a mutant, variant, derivative or allele of
one of the sequences shown. The sequence may differ from that
shown in a said figure by a change which is one or more of
5addition, insertion, deletion and substitution of one or more
nucleotides of the sequence shown. Changes to a nucleotide
sequence may result in an amino acid change at the protein
level, or not, as determined by the genetic code.
loThus, nucleic acid according to the present invention may
include a sequence different from the sequence shown in a
figure herein yet encode a polypeptide with the same amino
acid sequence.
150n the other hand the encoded polypeptide may comprise an
amino acid sequence which differs by one or more amino acid
residues from the amino acid sequence shown in Figure 1(c),
Figure 1(d) or Figure 2(c). Nucleic acid encoding a
polypeptide which is an amino acid sequence mutant, variant,
2oderivative or allele of the sequence shown in one of these
figures is further provided by the present invention. Such
polypeptides are discussed below. Nucleic acid encoding such
a polypeptide may show at the nucleotide sequence and/or
encoded amino acid level greater than about 60o homology with
25the coding sequence shown in the relevant figure and/or the
amino acid sequence shown in the relevant figure, greater than
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
13
about 70o homology, greater than about 80o homology, greater
than about 90o homology or greater than about 95o homology.
For amino acid "homology", this may be understood to be
similarity (according to the established principles of amino
acid similarity, e.g. as determined using the algorithm GAP
(Genetics Computer Group, Madison, WI) or identity. GAP uses
the Needleman and Wunsch algorithm to align two complete
sequences that maximizes the number of matches and minimizes
the number of gaps. Generally, the default parameters are
loused, with a gap creation penalty = 12 and gap extension
penalty = 4. Use of GAP may be preferred but other algorithms
may be used, e.g. BLAST (which uses the method of Altschul et
al. (1990) J. Mol. Biol. 215: 405-410, FASTA (which uses the
method of Pearson and Lipman (1988) PNAS USA 85: 2444-2448),
or the Smith-Waterman algorithm (Smith and Waterman (1981) J.
Mo1 Biol. 147: 195-197), generally employing default
parameters. Use of either of the terms "homology" and
"homologous" herein does not imply any necessary evolutionary
relationship between compared sequences, in keeping for
2oexample with standard use of terms such as "homologous
recombination" which merely requires that two nucleotide
sequences are sufficiently similar to recombine under the
appropriate conditions. Further discussion of polypeptides
according to the present invention, which may be encoded by
nucleic acid according to the present invention, is found
below.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
14
The present invention extends to nucleic acid that hybridizes
with any one or more of the specific sequences disclosed
herein under stringent conditions. Suitable conditions
include, e.g. for detection of sequences that are about 80-90o
5identical suitable conditions include hybridization overnight
at 42°C in 0.25M Na2HP09, pH 7.2, 6.5o SDS, loo dextran sulfate
and a final wash at 55°C in 0.1X SSC, 0.1$ SDS. For detection
of sequences that are greater than about 90o identical,
suitable conditions include hybridization overnight at 65°C in
0 . 25M Na2HP0q, pH 7 . 2, 6. 5 o SDS, 10 o dextran sulfate and a
final wash at 60°C in O.1X SSC, O.lo SDS.
The coding sequence may be included within a nucleic acid
molecule which has the sequence shown in Figure 1(a) or Figure
152(a) and encode the full polypeptide of Figure 1(c) or Figure
2(c). Mutants, variants, derivatives and alleles of these
sequences are included within the scope of the present
invention in terms analogous to those set out in the preceding
paragraph and in the following disclosure. The same applies
2ofor the second isoform of SAPLa, of which the amino acid
sequence is shown in Figure 1(d).
Alterations in a sequence according to the present invention
which are associated with IDDM or other disease may be
25preferred in accordance with embodiments of the present
invention. Implications for screening, e.g. for diagnostic or
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
prognostic purposes, are discussed below. Particular
nucleotide sequence alleles according to the present invention
have sequences with a variation indicated in Table 2. One or
more of these may be associated with susceptibility to IDDM or
5other disease.
Generally, nucleic acid according to the present invention is
provided as an isolate, in isolated and/or purified form, or
free or substantially free of material with which it is
ionaturally associated, such as free or substantially free of
nucleic acid flanking the gene in the human genome, except
possibly one or more regulatory sequences) for expression.
Nucleic acid may be wholly or partially synthetic and may
include genomic DNA, cDNA or RNA. The coding sequence shown
l5herein is a DNA sequence. Where nucleic acid according to the
invention includes RNA, reference to the sequence shown should
be construed as encompassing reference to the RNA equivalent,
with U substituted for T.
2oNucleic acid may be provided as part of a replicable vector,
and also provided by the present invention are a vector
including nucleic acid as set out above, particularly any
expression vector from which the encoded polypeptide can be
expressed under appropriate conditions, and a host cell
25containing any such vector or nucleic acid. An expression
vector in this context is a nucleic acid molecule including
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
16
nucleic acid encoding a polypeptide of interest and
appropriate regulatory sequences for expression of the
polypeptide, in an in vitro expression system, e.g.
reticulocyte lysate, or in vivo, e.g. in eukaryotic cells such
5as COS or CHO cells or in prokaryotic cells such as E. coli.
This is discussed further below.
The nucleic acid sequence provided in accordance with the
present invention is useful for identifying nucleic acid of
lointerest (and which may be according to the present invention)
in a test sample. The present invention provides a method of
obtaining nucleic acid of interest, the method including
hybridisation of a probe having a sequence shown herein, or a
complementary sequence, to target nucleic acid. Hybridisation
l5is generally followed by identification of successful
hybridisation and isolation of nucleic acid which has
hybridised to the probe, which may involve one or more steps
of PCR. It will not usually be necessary to use a probe with
the complete sequence shown in any of these figures. Shorter
2ofragments, particularly fragments with a sequence encoding the
conserved motifs may be used.
Nucleic acid according to the present invention is obtainable
using one or more oligonucleotide probes or primers designed
25to hybridise with one or more fragments of the nucleic acid
sequence shown in any of the figures, particularly fragments
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
17
of relatively rare sequence, based on codon usage or
statistical analysis. A primer designed to hybridise with a
fragment of the nucleic acid sequence shown in any of the
figures may be used in conjunction with one or more
5oligonucleotides designed to hybridise to a sequence in a
cloning vector within which target nucleic acid has been
cloned, or in so-called "RACE" (rapid amplification of cDNA
ends) in which cDNA's in a library are ligated to an
oligonucleotide linker and PCR is performed using a primer
lOwhich hybridises with a sequence shown and a primer which
hybridises to the oligonucleotide linker.
Such oligonucleotide probes or primers, as well as the full-
length sequence (and mutants, alleles, variants and
i5derivatives) are also useful in screening a test sample
containing nucleic acid for the presence of alleles, mutants
and variants, with diagnostic and/or prognostic implications
as discussed in more detail below.
2oNucleic acid isolated and/or purified from o.ne or more cells
(e. g. human) or a nucleic acid library derived from nucleic
acid isolated and/or purified from cells (e. g. a cDNA library
derived from mRNA isolated from the cells), may be probed
under conditions for selective hybridisation and/or subjected
25to a specific nucleic acid amplification reaction such as the
polymerase chain reaction (PCR) (reviewed for instance in "PCR
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
18
protocols; A Guide to Methods and Applications", Eds. Innis
et al, 1990, Academic Press, New York, Mullis et al, Cold
Spring Harbor Symp. Quant. Biol., 51:263, (1987), Ehrlich
(ed), PCR technology, Stockton Press, NY, 1989, and Ehrlich et
s al, Science, 252:1643-1650, (1991)). PCR comprises steps of
denaturation of template nucleic acid (if double-stranded),
annealing of primer to target, and polymerisation. The
nucleic acid probed or used as template in the amplification
reaction may be genomic DNA, cDNA or RNA. Other specific
lonucleic acid amplification techniques include strand
displacement activation, the QB replicase system, the repair
chain reaction, the ligase chain reaction and ligation
activated transcription. For convenience, and because it is
generally preferred, the term PCR is used herein in contexts
l5where other nucleic acid amplification techniques may be
applied by those skilled in the art. Unless the context
requires otherwise, reference to PCR should be taken to cover
use of any suitable nucleic amplification reaction available
in the art.
In the context of cloning, it may be necessary for one or more
gene fragments to be ligated to generate a full-length coding
sequence. Also, where a full-length encoding nucleic acid
molecule has not been obtained, a smaller molecule
representing part of the full molecule, may be used to obtain
full-length clones. Inserts may be prepared from partial cDNA
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
19
clones and used to screen cDNA libraries. The full-length
clones isolated may be subcloned into expression vectors and
activity assayed by transfection into suitable host cells,
e.g. with a reporter plasmid.
A method may include hybridisation of one or more (e. g. two)
probes or primers to target nucleic acid. Where the nucleic
acid is double-stranded DNA, hybridisation will generally be
preceded by denaturation to produce single-stranded DNA. The
iohybridisation may be as part of a PCR procedure, or as part of
a probing procedure not involving PCR. An example procedure
would be a combination of PCR and low stringency
hybridisation. A screening procedure, chosen from the many
available to those skilled in the art, is used to identify
l5successful hybridisation events and isolated hybridised
nucleic acid.
Binding of a probe to target nucleic acid (e.g. DNA) may be
measured using any of a variety of techniques at the disposal
2oof those skilled in the art. For instance, probes may be
radioactively, fluorescently or enzymatically labelled. Other
methods not employing labelling of probe include examination
of restriction fragment length polymorphisms, amplification
using PCR, RN'ase cleavage and allele specific oligonucleotide
25probing. Probing may employ the standard Southern blotting
technique. For instance DNA may be extracted from cells and
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
digested with different restriction enzymes. Restriction
fragments may then be separated by electrophoresis on an
agarose gel, before denaturation and transfer to a
nitrocellulose filter. Labelled probe may be hybridised to
5the DNA fragments on the filter and binding determined. DNA
for probing may be prepared from RNA preparations from cells.
Preliminary experiments may be performed by hybridising under
low stringency conditions various probes to Southern blots of
lODNA digested with restriction enzymes. Suitable conditions
would be achieved when a large number of hybridising fragments
were obtained while the background hybridisation was low.
Using these conditions nucleic acid libraries, e.g. cDNA
libraries representative of expressed sequences, may be
l5searched. Those skilled in the art are well able to employ
suitable conditions of the desired stringency for selective
hybridisation, taking into account factors such as
oligonucleotide length and base composition, temperature and
so on. On the basis of amino acid sequence information,
20oligonucleotide probes or primers may be designed, taking into
account the degeneracy of the genetic code, and, where
appropriate, codon usage of the organism from the candidate
nucleic acid is derived. An oligonucleotide for use in
nucleic acid amplification may have about 10 or fewer codons
(e.g. 6, 7 or 8), i.e. be about 30 or fewer nucleotides in
length (e.g. 18, 21 or 24). Generally specific primers are
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
21
upwards of 14 nucleotides in length, but need not be than 18-
20. Those skilled in the art are well versed in the design of
primers for use processes such as PCR. Various techniques for
synthesizing oligonucleotide primers are well known in the
art, including phosphotriester and phosphodiester synthesis
methods.
Preferred amino acid sequences suitable for use in the design
of probes or PCR primers may include sequences conserved
(completely, substantially or partly) encoding the motifs
highlighted in Figure 3.
A further aspect of the present invention provides an
oligonucleotide or polynucleotide fragment of the nucleotide
l5sequence shown in any of the figures herein providing nucleic
acid according to the present invention, or a complementary
sequence, in particular for use in a method of obtaining
and/or screening nucleic acid. Some preferred
oligonucleotides have a sequence shown in Table 1, or a
2osequence which differs from any of the~sequences shown by
addition, substitution, insertion or deletion of one or more
nucleotides, but preferably without abolition of ability to
hybridise selectively with nucleic acid in accordance with the
present invention, that is wherein the degree of similarity of
25the oligonucleotide or polynucleotide with one of the
sequences given is sufficiently high.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
22
In some preferred embodiments, oligonucleotides according to
the present invention that are fragments of any of the
sequences shown, or any allele associated with IDDM or other
disease susceptibility, are at least about 10 nucleotides in
5length, more preferably at least about 15 nucleotides in
length, more preferably at least about 20 nucleotides in
length. Such fragments themselves individually represent
aspects of the present invention. Fragments and other
oligonucleotides may be used as primers or probes as discussed
lobut may also be generated (e. g. by PCR) in methods concerned
with determining the presence in a test sample of a sequence
indicative of IDDM or other disease susceptibility.
Methods involving use of nucleic acid in diagnostic and/or
l5prognostic contexts, for instance in determining
susceptibility to IDDM or other disease, and other methods
concerned with determining the presence of sequences
indicative of IDDM or other disease susceptibility are
discussed below.
Further embodiments of oligonucleotides according to the
present invention are anti-sense oligonucleotide sequences
based on the nucleic acid sequences described herein. Anti-
sense oligonucleotides may be designed to hybridise to the
25complementary sequence of nucleic acid, pre-mRNA or mature
mRNA, interfering with the production of polypeptide encoded
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
23
by a given DNA sequence (e.g. either native polypeptide or a
mutant form thereof), so that its expression is reduce or
prevented altogether. Anti-sense techniques may be used to
target a coding sequence, a control sequence of a gene, e.g.
Sin the 5' flanking sequence, whereby the antisense
oligonucleotides can interfere with control sequences. Anti-
sense oligonucleotides may be DNA or RNA and may be of around
14-23 nucleotides, particularly around 15-18 nucleotides, in
length. The construction of antisense sequences and their use
lois described in Peyman and Ulman, Chemical Reviews, 90:543-
584, (1990), and Crooke, Ann. Rev. Pharmacol. Toxicol.,
32:329-376, (1992).
Nucleic acid according to the present invention may be used in
l5methods of gene therapy, for instance in treatment of
individuals with the aim of preventing or curing (wholly or
partially) IDDM or other. disease. This may ease one or more
symptoms of the disease. This is discussed below.
2oNucleic acid according to the present invention, such-as a
full-length coding sequence or oligonucleotide probe or
primer, may be provided as part of a kit, e.g. in a suitable
container such as a vial in which the contents are protected
from the external environment. The kit may include
25instructions for use of the nucleic acid, e.g. in PCR and/or a
method for determining the presence of nucleic acid of
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
24
interest in a test sample. A kit wherein the nucleic acid is
intended for use in PCR may include one or more other reagents
required for the reaction, such as polymerase, nucleosides,
buffer solution etc. The nucleic acid may be labelled. A kit
5for use in determining the presence or absence of nucleic acid
of interest may include one or more articles and/or reagents
for performance of the method, such as means for providing the
test sample itself, e.g. a swab for removing cells from the
buccal cavity or a syringe for removing a blood sample (such
locomponents generally being sterile).
According to a further aspect, the present invention provides
a nucleic acid molecule including a SAPL gene promoter.
l5The promoter may comprise or consist essentially of a sequence
of nucleotides 5' to the SAPL gene in the human chromosome, or
an equivalent sequence in another species, such as the mouse.
Any of the sequences disclosed in the figures herein may be
2oused to construct a probe for use in identification and
isolation of a promoter from a genomic library containing a
genomic SAPL gene. Techniques and conditions for such probing
are well known in the art and are discussed elsewhere herein.
To find minimal elements or motifs responsible for tissue
25and/or developmental regulation, restriction enzyme or
nucleases may be used to digest a nucleic acid molecule,
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
followed by an appropriate assay (for example using a reporter
gene such as luciferase) to determine the sequence required.
A preferred embodiment of the present invention provides a
nucleic acid isolate with the minimal nucleotide sequence
5required for SAPL promoter activity.
Figure 4 shows a sequence for the putative SAPL promoter.
Underlined sequences exhibit similarity to the syntenic mouse
DNA sequence. Sequence in bold is found in the SAPL cDNA
losequence of Figure 1(a). Capital letters indicate bases that
match the pattern for Sp1 transscription factor binding sites.
GGGGGTCC matches an NF kappa B transcription factor binding
site. GCCAAT matches the CART site. The sequence was
identified as a putative promoter by the computer algorithm
~5 PROMOTERSCAN (Prestridge (1995) J. Mol Biol. 249: 923-932) and
corresponds to a CpG island.
As noted, the promoter may comprise one or more sequence
motifs or elements conferring developmental and/or tissue-
2ospecific regulatory control of expression. Other regulatory
sequences may be included, for instance as identified by
mutation or digest assay in an appropriate expression system
or by sequence comparison with available information, e.g.
using a computer to search on-line databases.
By "promoter" is meant a sequence of nucleotides from which
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
26
transcription may be initiated of DNA operably linked
downstream (i.e. in the 3' direction on the sense strand of
double-stranded DNA).
5"Operably linked" means joined as part of the same nucleic
acid molecule, suitably positioned and oriented for
transcription to be initiated from the promoter. DNA operably
linked to a promoter is "under transcriptional initiation
regulation'' of the promoter.
to
The present invention extends to a promoter which has a
nucleotide sequence which is allele, mutant, variant or
derivative, by way of nucleotide addition, insertion,
substitution or deletion of a promoter sequence as provided
l5herein. Preferred levels of sequence homology with a provided
sequence may be analogous to those set out above for encoding
nucleic acid and polypeptides according to the present
invention. Systematic or random mutagenesis of nucleic acid
to make an alteration to the nucleotide sequence may be
2operformed using any technique known to those skilled in the
art. One or more alterations to a promoter sequence according
to the present invention may increase or decrease promoter
activity, or increase or decrease the magnitude of the effect
of a substance able to modulate the promoter activity.
"Promoter activity" is used to refer to ability to initiate
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
27
transcription. The level of promoter activity is quantifiable
for instance by assessment of the amount of mRNA produced by
transcription from the promoter or by assessment of the amount
of protein product produced by translation of mRNA produced by
5transcription from the promoter. The amount of a specific
mRNA present in an expression system may be determined for
example using specific oligonucleotides which are able to
hybridise with the mRNA and which are labelled or may be used
in a specific amplification reaction such as the polymerase
lochain reaction. Use of a reporter gene facilitates
determination of promoter activity by reference to protein
production.
Further provided by the present invention is a nucleic acid
l5construct comprising a SAPL promoter region or a fragment,
mutant, allele, derivative or variant thereof able to promoter
transcription, operably linked to a heterologous gene, e.g. a
coding sequence. A "heterologous" or "exogenous" gene is
generally not a modified form of SAPL. Generally, the gene
2omay be transcribed into mRNA which may be translated into a
peptide or polypeptide product which may be detected and
preferably quantitated following expression. A gene whose
encoded product may be assayed following expression is termed
a "reporter gene", i.e. a gene which "reports" on promoter
25 activity.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
28
The reporter gene preferably encodes an enzyme which catalyses
a reaction which produces a detectable signal, preferably a
visually detectable signal, such as a coloured product. Many
examples are known, including ~3-galactosidase and luciferase.
5(3-galactosidase activity may be assayed by production of blue
colour on substrate, the assay being by eye or by use of a
spectro-photometer to measure absorbance. Fluorescence, for
example that produced as a result of luciferase activity, may
be quantitated using a spectrophotometer. Radioactive assays
lomay be used, for instance using chloramphenicol
acetyltransferase, which may also be used in non-radioactive
assays. The presence and/or amount of gene product resulting
from expression from the reporter gene may be determined using
a molecule able to bind the product, such as an antibody or
l5fragment thereof. The binding molecule may be labelled
directly or indirectly using any standard technique.
Those skilled in the art are well aware of a multitude of
possible reporter genes and assay techniques which may be used
2oto determine gene activity. Any suitable reporter/assay may
be used and it should be appreciated that no particular choice
is essential to or a limitation of the present invention.
Nucleic acid constructs comprising a promoter (as disclosed
25herein) and a heterologous gene (reporter) may be employed in
screening for a substance able to modulate activity of the
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
29
promoter. For therapeutic purposes, e.g. for treatment of
IDDM or other disease,~a substance~able to up-regulate
expression of the promoter may be sought. A method of
screening for ability of a substance to modulate activity of a
5promoter may comprise contacting an expression system, such as
a host cell, containing a nucleic acid construct as herein
disclosed with a test or candidate substance and determining
expression of the heterologous gene.
loThe level of expression in the presence of the test substance
may be compared with the level of expression in the absence of
the test substance. A difference in expression in the
presence of the test substance indicates ability of the
substance to modulate gene expression. An increase in
l5expression of the heterologous gene compared with expression
of another gene not linked to a promoter as disclosed herein
indicates specificity of the substance for modulation of the
promoter.
2oA promoter construct may be introduced into-a cell line using
any technique previously described to produce a stable cell --
line containing the reporter construct integrated into the
genome. The cells may be grown and incubated with test
compounds for varying times. The cells may be grown in 96
25we11 plates to facilitate the analysis of large numbers of
compounds. The cells may then be washed and the reporter gene
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
expression analysed. For some reporters, such as luciferase
the cells will be lysed then analysed.
Following identification of a substance which modulates or
5affects promoter activity, the substance may be investigated
further. Furthermore, it may be manufactured and/or used in
preparation, i.e. manufacture or formulation, of a composition
such as a medicament, pharmaceutical composition or drug.
These may be administered to individuals.
to
Thus, the present invention extends in various aspects not
only to a substance identified using a nucleic acid molecule
as a modulator of promoter activity, in accordance with what
is disclosed herein, but also a pharmaceutical composition,
l5medicament, drug or other composition comprising such a
substance, a method comprising administration of such a
composition to a patient, e.g. for increasing SAPL expression
for instance in treatment (which may include preventative
treatment) of IDDM or other disease, use of such a substance
2oin manufacture of a composition for administration, e.g. for
increasing SAPL expression for instance in treatment of IDDM
or other disease, and a method of making a pharmaceutical
composition comprising admixing such a substance with a
pharmaceutically acceptable excipient, vehicle or carrier, and
25optionally other ingredients.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
31
A further aspect of the present invention provides a
polypeptide which has the amino acid sequence shown in Figure
1(c), Figure 1(d) or Figure 2(c), or includes the first 776
amino acids of Figure 1(c) and Figure 2(c), which are
5identical between SAPLa and SAPLb, which may be in isolated
and/or purified form, free or substantially free of material
with which it is naturally associated, such as other
polypeptides or such as human polypeptides other than that for
which the amino acid sequence is shown in a said figure, or
(for example if produced by expression in a prokaryotic cell)
lacking in native glycosylation, e.g. unglycosylated.
Polypeptides which are amino acid sequence variants, alleles,
derivatives or mutants are also provided by the present
i5invention. A polypeptide which is a variant, allele,
derivative or mutant may have an amino acid sequence which
differs from that given in a figure herein by one or more of
addition, substitution, deletion and insertion of one or more
amino acids. Preferred such polypeptides have SAPL function,
2othat is to say have one or more of the following properties:
immunological cross-reactivity with an antibody reactive the
polypeptide for which the sequence is given in a figure
herein; sharing an epitope with the polypeptide for which the
amino acid sequence is shown in a figure herein (as determined
25for example by immunological cross-reactivity between the two
polypeptides); a biological activity which is inhibited by an
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
32
antibody raised against the polypeptide whose sequence is
shown in a figure herein; ability to complement the yeast
mutation; containing one or more of the conserved sequences
identified in Figure 3; containing the STDSEE repeat.
5Alteration of sequence may change the nature and/or level of
activity and/or stability of the SAPL protein.
A polypeptide which is an amino acid sequence variant, allele,
derivative or mutant of the amino acid sequence shown in a
lOfigure herein may comprise an amino acid sequence which shares
greater than about 35o sequence identity with the sequence
shown, greater than about 400, greater than about 500, greater
than about 600, greater than about 700, greater than about
800, greater than about 900 or greater than about 95%. The
l5sequence may share greater than about 60% similarity, greater
than about 70o similarity, greater than about 80o similarity
or greater than about 90o similarity with the amino acid
sequence shown in the relevant figure. Amino acid similarity
is generally defined with reference to the algorithm GAP
20 (Genetics Computer Group, Madison, WI) as noted above, or the
TBLASTN program, of Altschul et al. (1990) J. Mol. Biol. 215:
403-10, or the Smith-Waterman algorithm (Smith and Waterman
(1981) J. Mol Biol. 147: 195-197). Similarity allows for
"conservative variation", i.e. substitution of one hydrophobic
25 residue such as isoleucine, valine, leucine or methionine for
another, or the substitution of one polar residue for another,
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
33
such as arginine for lysine, glutamic for aspartic acid, or
glutamine for asparagine. Particular amino acid sequence
variants may differ from that shown in a figure herein by
insertion, addition, substitution or deletion of 1 amino acid,
52, 3, 4, 5-10, 10-20 20-30, 30-50, 50-100, 100-150, or more
than 150 amino acids.
Sequence comparison may be made over the full-length of the
relevant sequence shown herein, or may more preferably be over
1oa contiguous sequence of about or greater than about 20, 25,
30, 40, 50, 60, 70, 80, 90, 100, 133, 150, 167, 200, 233, 250,
267, 300, 333, 350, 400, 450, 500, 550, 600, 650, 700, 750,
760, 770, 776, 780, or 790 amino acids or nucleotide triplets,
compared with the relevant amino acid sequence or nucleotide
i5sequence as the case may be.
The present invention also includes peptides which include or
consist of fragments of a polypeptide of the invention.
2oThe skilled person can use the techniques described herein and
others well known in the art to produce large amounts of
peptides, for instance by expression from encoding nucleic
acid.
25Peptides can also be generated wholly or partly by chemical
synthesis. The compounds of the present invention can be
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
34
readily prepared according to well-established, standard
liquid or, preferably, solid-phase peptide synthesis methods,
general descriptions of which are broadly available (see, for
example, in J.M. Stewart and J.D. Young, Solid Phase Peptide
5Synthesis, 2nd edition, Pierce Chemical Company, Rockford,
Illinois (1984), in M. Bodanzsky and A. Bodanzsky, The
Practice of Peptide Synthesis, Springer Verlag, New York
(1984); and Applied Biosystems 430A Users Manual, ABI Inc.,
Foster City, California), or they may be prepared in solution,
loby the liquid phase method or by any combination of solid-
phase, liquid phase and solution chemistry, e.g. by first
completing the respective peptide portion and then, if desired
and appropriate, after removal of any protecting groups being
present, by introduction of the residue X by reaction of the
l5respective carbonic or sulfonic acid or a reactive derivative
thereof.
The present invention also includes active portions,
fragments, derivatives and functional mimetics of the
2opolypeptides of the invention. An "active portion" of a
polypeptide means a peptide which is less than said full
length polypeptide, but which retains a biological activity
such as disclosed herein.
25A "fragment" of a polypeptide generally means a stretch of
amino acid residues of at least about five contiguous amino
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
acids, often at least about seven contiguous amino acids,
typically at least about nine contiguous amino acids, more
preferably at least about 13 contiguous amino acids, and, more
preferably, at least about 20 to 30 or more contiguous amino
5 acids. Fragments of the SAPL polypeptide sequence may include
antigenic determinants or epitopes useful for raising
antibodies to a portion of the amino acid sequence. Alanine
scans are commonly used to find and refine peptide motifs
within polypeptides, this involving the systematic replacement
roof each residue in turn with the amino acid alanine, followed
by an assessment of biological activity.
Preferred fragments of SAPL include those with any of the
following amino acid sequences:
15HPSQEEDRHSNASQ
RIQQFDDGGSDEEDI
PESQRRSSSGSTDSE
PSSSPEQRTGQPSAPGDTS
which may be used for instance in raising or isolating
2oantibodies. Variant and derivative peptides, peptides which
have an amino acid sequence which differs from one of these
sequences by way of addition, insertion, deletion or
substitution of one or more amino acids are also provided by
the present invention, generally with the proviso that the
25 variant or derivative peptide is bound by an antibody or other
specific binding member which binds one of the peptides whose
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
36
sequence is shown. A peptide which is a variant or
derivative of one of the shown peptides may compete with the
shown peptide for binding to a specific binding member, such
as an antibody or antigen-binding fragment thereof.
Where additional amino acids are included in a peptide, these
may be heterologous or foreign to the polypeptide of the
invention, and the peptide may be about 20, 25, 30 or 35 amino
acids in length. A peptide according to this aspect may be
loincluded within a larger fusion protein, particularly where
the peptide is fused to a non-SAPL (i.e. heterologous or
foreign) sequence, such as a polypeptide or protein domain.
A "derivative" of a polypeptide or a fragment thereof may
l5include a polypeptide modified by varying the amino acid
sequence of the protein, e.g. by manipulation of the nucleic
acid encoding the protein or by altering the protein itself.
Such derivatives of the natural amino acid sequence may
involve one or more of insertion, addition, deletion or
2osubstitution of one or more amino acids, which may be-without
fundamentally altering the qualitative nature of biological
activity of the wild type polypeptide. Also encompassed
within the scope of the present invention are functional
mimetics of active fragments of the SAPL polypeptides provided
25 (including alleles, mutants, derivatives and variants). The
term "functional mimetic" means a substance which may not
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
37
contain an active portion of the relevant amino acid sequence,
and probably is not a peptide at all, but which retains in
qualitative terms biological activity of natural SAPL
polypeptide. The design and screening of candidate mimetics
sis described in detail below.
Other fragments of the polypeptides for which sequence
information is provided herein are provided as aspects of the
present invention, for instance corresponding to functional
z0 domains .
A polypeptide according to the present invention may be
isolated and/or purified (e. g. using an antibody) for instance
after production by expression from encoding nucleic acid (for
l5which see below). Thus, a polypeptide may be provided free or
substantially free from contaminants with which it is
naturally associated (if it is a naturally-occurring
polypeptide). A polypeptide may be provided free or
substantially free of other polypeptides. Polypeptides
2oaccording to the present invention may be generated wholly or
partly by chemical synthesis. The isolated and/or purified
polypeptide may be used in formulation of a composition, which
may include at least one additional component, for example a
pharmaceutical composition including a pharmaceutically
25acceptable excipient, vehicle or carrier. A composition
including a polypeptide according to the invention may be used
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
38
in prophylactic and/or therapeutic treatment as discussed
below.
A polypeptide, peptide, allele, mutant, derivative or variant
5according to the present invention may be used as an immunogen
or otherwise in obtaining specific antibodies. Antibodies are
useful in purification and other manipulation of polypeptides
and peptides, diagnostic screening and therapeutic contexts.
This is discussed further below.
A polypeptide according to the present invention may be used
in screening for molecules which affect or modulate its
activity or function, e.g. binding to or modulating the
activity of a protein phosphatase, or ability to complement
SAP mutant yeast. Such molecules may interact with SAPL or
with one or more accessory molecules, and may be useful in a
therapeutic (possibly including prophylactic) context.
It is well known that pharmaceutical research leading to the
2oidentification of a new drug may involve the screening of very
large numbers of candidate substances, both before and even
after a lead compound has been found. This is one factor
which makes pharmaceutical research very expensive and time-
consuming. Means for assisting in the screening process can
25have considerable commercial importance and utility. Such
means for screening for substances potentially useful in
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
39
treating or preventing IDDM or other disease is provided by
polypeptides according to the present invention. Substances
identified as modulators of the polypeptide represent an
advance in the fight against IDDM and other diseases since
5they provide basis for design and investigation of
therapeutics for in vivo use. Furthermore, they may be useful
in any of a number of conditions, including autoimmune
diseases, such as glomerulonephritis, diseases and disorders
involving cellular proliferation, such as psoriasis, tumors
b and cancer, given the functional indications for SAPL,
discussed elsewhere herein. As noted elsewhere, SAPL ,
fragments thereof, and nucleic acid according to the invention
may also be useful in combatting any of these diseases and
disorders.
In various further aspects the present invention relates to
screening and assay methods and means, and substances
identified thereby.
2oThus, further aspects of the present invention provide the use
of a polypeptide or peptide (particularly a fragment of a
polypeptide of the invention as disclosed, and/or encoding
nucleic acid therefor, in screening or searching for and/or
obtaining/identifying a substance, e.g. peptide or chemical
compound, which interacts and/or~binds with the polypeptide or
peptide and/or interferes with its function or activity or
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
that of another substance, e.g. polypeptide or peptide, which
interacts and/or binds with the polypeptide or peptide of the
invention. For instance, a method according to one aspect of
the invention includes providing a polypeptide or peptide of
5the invention and bringing it into contact with a substance,
which contact may result in binding between the polypeptide or
peptide and the substance. Binding may be determined by any
of a number of techniques available in the art, both
qualitative and quantitative.
In various aspects the present invention is concerned with
provision of assays for substances which inhibit interaction
between a polypeptide of the invention and one or more protein
phosphatases, particularly those similar to SIT4 such as human
l5protein phosphatase 6 (Bastians and Ponstingle (1996) J. Cell.
Sci. 109: 2865-2874).
Further assays are for substances which interact with or bind
a polypeptide of the invention and/or modulate one or more of
20its activities.
One aspect of the present invention provides an assay which
includes:
(a) bringing into contact a polypeptide or peptide according
25to the invention and a putative binding molecule or other test
substance; and
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
41
(b) determining interaction or binding between the
polypeptide or peptide and the test substance.
A substance which interacts with the polypeptide or peptide of
5the invention may be isolated and/or purified, manufactured
and/or used to modulate its activity as discussed.
A further aspect of the present invention provides an assay
method which includes:
l0 (a) bringing into contact a substance including a SAPL
polypeptide or fragment, mutant, variant or derivative
thereof, a substance including a fragment of a second
polypeptide or a fragment, mutant, variant or derivative of
said second polypeptide, which is able to bind the SAPL
l5polypeptide; and a test compound, under conditions in which in
the absence of the test compound being an inhibitor, the two
said substances interact;
(b) determining interaction between said substance.
2oIt is not necessary to use the entire proteins for assays of
the invention which test for binding between two molecules.
Fragments may be generated and used in any suitable way known
to those of skill in the art. Suitable ways of generating
fragments include, but are not limited to, recombinant
25expression of a fragment from encoding DNA. Such fragments
may be generated by taking encoding DNA, identifying suitable
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
42
restriction enzyme recognition sites either side of the
portion to be expressed, and cutting out said portion from the
DNA. The portion may then be operably linked to a suitable
promoter in a standard commercially available expression
5system. Another recombinant approach is to amplify the
relevant portion of the DNA with suitable PCR primers. Small
fragments (e.g. up to about 20 or 30 amino acids) may also be
generated using peptide synthesis methods which are well known
in the art.
to
The precise format of the assay of the invention may be varied
by those of skill in the art using routine skill and
knowledge. For example, the interaction between the
polypeptides may be studied in vitro by labelling one with a
l5detectable label and bringing it into contact with the other
which has been immobilised on a solid support. Suitable
detectable labels include 35S-methionine which may be
incorporated into recombinantly produced peptides and
polypeptides. Recombinantly produced peptides and
2opolypeptides may also be expressed as a fusion protein
containing an epitope which can be labelled with an antibody.
Fusion proteins may be generated that incorporate six
histidine residues at either the N-terminus or C-terminus of
25the recombinant protein. Such a histidine tag may be used for
purification of the protein by using commercially available
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
43
columns which contain a metal ion, either nickel or cobalt
(Clontech, Palo Alto, CA, USA). These tags also serve for
detecting the protein using commercially available monoclonal
antibodies directed against the six histidine residues
(Clontech, Palo Alto, CA, USA).
The protein which is immobilized on a solid support may be
immobilized using an antibody against that protein bound to a
solid support or via other technologies which are known per
l0 se. A preferred in vitro interaction may utilise a fusion
protein including glutathione-S-transferase (GST). This may
be immobilized on glutathione agarose beads. In an in vitro
assay format of the type described above a test compound can
be assayed by determining its ability to diminish the amount
l5of labelled peptide or polypeptide which binds to the
immobilized GST-fusion polypeptide. This may be determined by
fractionating the glutathione-agarose beads by SDS-
polyacrylamide gel electrophoresis. Alternatively, the beads
may be rinsed to remove unbound protein and the amount of
2oprotein which has bound can be determined by counting the
amount of label present in, for example, a suitable
scintillation counter.
An assay according to the present invention may also take the
25 form of an in vivo assay. The in vivo assay may be performed
in a cell line such as a yeast strain in which the relevanty~
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
44
polypeptides or peptides are expressed from one or more
vectors introduced into the cell.
A method of screening for a substance which modulates activity
5of a polypeptide may include contacting one or more test
substances with the polypeptide in a suitable reaction medium,
testing the activity of the treated polypeptide and comparing
that activity with the activity of the polypeptide in
comparable reaction medium untreated with the test substance
loor substances. A difference in activity between the treated
and untreated polypeptides is indicative of a modulating
effect of the relevant test substance or substances.
In a further aspect of the invention there is provided an
l5assay method which includes:
(a) bringing into contact a substance including a fragment of
a polypeptide according to the invention including a putative
phosphorylation site, e.g. as identified in Table 4, or a
mutant, variant or derivative thereof and a test compound in
2othe presence of a kinase under conditions in which the kinase
normally phosphorylates said fragment, mutant, variant or
derivative; and
(b) determining phosphorylation of said fragment, mutant,
variant or derivative.
The kinase may be, for example, cAMP dependent protein kinase,
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
protein kinase C, or casein kinase 2
Phosphorylation may be determined for example by immobilising
a polypeptide of the invention, or fragment, mutant, variant
5 or derivative thereof, e.g. on a bead or plate, and detecting
phosphorylation using an antibody or other binding molecule
which binds the relevant site of phosphorylation with a
different affinity when the site is phosphorylated from when
the site is not phosphorylated. Such antibodies may be
l0obtained by means of any standard technique as discussed
elsewhere herein, e.g. using a phosphorylated peptide (such as
a fragment of a SAPL polypeptide). Binding of a binding
molecule which discriminates between the phosphorylated and
non-phosphorylated form of the polypeptide or relevant
l5fragment, mutant, variant or derivative thereof may be
assessed using any technique available to those skilled in the
art, which may involve determination of the presence of a
suitable label, such as fluorescence. Phosphorylation may be
determined by immobilisation of the polypeptide or a fragment,
2o mutant, variant or derivative thereof, on a-suitable substrate
such as a bead or plate, wherein the substrate is impregnated
with scintillant, such as in a standard scintillation
proximetry assay, with phosphorylation being determined via
measurement of the incorporation of radioactive phosphate.
25 Phosphate incorporation into a polypeptide or a fragment,
mutant, variant or derivative thereof, may be determined by
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
46
precipitation with acid, such as trichloroacetic acid, and
collection of the precipitate on a suitable material such as
nitrocellulose filter paper, followed by measurement of
incorporation of radiolabeled phosphate. SDS-PAGE separation
of substrate may be employed followed by detection of
radiolabel.
Combinatorial library technology (Schultz, JS (1996)
Biotechnol. Prog. 12:729-743) provides an efficient way of
lotesting a potentially vast number of different substances for
ability to modulate activity of a polypeptide. Prior to or as
well as being screened for modulation of activity, test
substances may be screened for ability to interact with the
polypeptide, e.g. in a yeast two-hybrid system (which requires
l5that both the polypeptide and the test substance can be
expressed in yeast from encoding nucleic acid). This may be
used as a coarse screen prior to testing a substance for
actual ability to modulate activity of the polypeptide.
2oThe amount of test substance or compound which may be added to
an assay of the invention will normally be determined by trial
and error depending upon the type of compound used.
Typically, from about 0.01 to 100 nM concentrations of
putative inhibitor compound may be used, for example from 0.1
25 to 10 nM. Greater concentrations may be used when a peptide
is the test substance.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
47
Compounds which may be used may be natural or synthetic
chemical compounds used in drug screening programmes.
Extracts of plants which contain several characterised or
uncharacterised components may also be used. A further class
5of putative inhibitor compounds can be derived from the SAPL
polypeptide and/or a ligand which binds. Peptide fragments of
from 5 to 40 amino acids, for example from 6 to 10 amino acids
from the region of the relevant polypeptide responsible for
interaction, may be tested for their ability to disrupt such
interaction .
Other candidate inhibitor compounds may be based on modelling
the 3-dimensional structure of a polypeptide or peptide
fragment and using rational drug design to provide potential
l5inhibitor compounds with particular molecular shape, size and
charge characteristics.
Following identification of a substance which modulates or
affects polypeptide activity, the substance may be
2oinvestigated further. Furthermore,it may be manufactured
and/or used in preparation, i.e. manufacture or formulation,
of a composition such as a medicament, pharmaceutical
composition or drug. These may be administered to
individuals.
Thus, the present invention extends in various aspects not
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
48
only to a substance identified as a modulator of polypeptide
activity, in accordance with what is disclosed herein, but
also a pharmaceutical composition, medicament, drug or other
composition comprising such a substance, a method comprising
5administration of such a composition to a patient, e.g. for
treatment (which may include preventative treatment) of IDDM
or other disease, use of such a substance in manufacture of a
composition for administration, e.g. for treatment of IDDM or
other disease, and a method of making a pharmaceutical
locomposition comprising admixing such a substance with a
pharmaceutically acceptable excipient, vehicle or carrier, and
optionally other ingredients.
A substance identified using as a modulator of polypeptide or
l5promoter function may be peptide or non-peptide in nature.
Non-peptide "small molecules" are often preferred for many in
vivo pharmaceutical uses. Accordingly, a mimetic or mimick of
the substance (particularly if a peptide) may be designed for
pharmaceutical use. The designing of mimetics to a known
2opharmaceutically active compound is.a known.approach to the
development of pharmaceuticals based on a "lead" compound.
This might be desirable where the active compound is difficult
or expensive to synthesise or where it is unsuitable for a
particular method of administration, e.g. peptides are not
25we11 suited as active agents for oral compositions as they
tend to be quickly degraded by proteases in the alimentary
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
49
canal. Mimetic design, synthesis and testing may be used to
avoid randomly screening large number of molecules for a
target property.
5There are several steps commonly taken in the design of a
mimetic from a compound having a given target property.
Firstly, the particular parts of the compound that are
critical and/or important in determining the target property
are determined. In the case of a peptide, this can be done by
losystematically varying the amino acid residues in the peptide,
e.g. by substituting each residue in turn. These parts or
residues constituting the active region of the compound are
known as its "pharmacophore".
i50nce the pharmacophore has been found, its structure is
modelled to according its physical properties, e.g.
stereochemistry, bonding, size and/or charge, using data from
a range of sources, e.g. spectroscopic techniques, X-ray
diffraction data and NMR. Computational analysis, similarity
2omapping (which models the charge and/or volume of a
pharmacophore, rather than the bonding between atoms) and
other techniques can be used in this modelling process.
In a variant of this approach, the three-dimensional structure
25of the ligand and its binding partner are modelled. This can
be especially useful where the ligand and/or binding partner
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
SO
change conformation on binding, allowing the model to take
account of this the design of the mimetic.
A template molecule is then selected onto which chemical
5groups which mimic the pharmacophore can be grafted. The
template molecule and the chemical groups grafted on to it can
conveniently be selected so that the mimetic is easy to
synthesise, is likely to be pharmacologically acceptable, and
does not degrade in vivo, while retaining the biological
loactivity of the lead compound. The mimetic or mimetics found
by this approach can then be screened to see whether they have
the target property, or to what extent they exhibit it.
Further optimisation or modification can then be carried out
to arrive at one or more final mimetics for in vivo or
l5clinical testing.
Mimetics of substances identified as having ability to
modulate SAPL polypeptide or promoter activity using a
screening method as disclosed herein are included within the
2oscope of the present invention. A polypeptide, peptide or
substance able to modulate activity of a polypeptide according
to the present invention may be provided in a kit, e.g. sealed
in a suitable container which protects its contents from the
external environment. Such a kit may include instructions for
25 use .
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
51
A convenient way of producing a polypeptide according to the
present invention is to express nucleic acid encoding it, by
use of the nucleic acid in an expression system. Accordingly,
the present invention also encompasses a method of making a
5polypeptide (as disclosed), the method including expression
from nucleic acid encoding the polypeptide (generally nucleic
acid according to the invention). This may conveniently be
achieved by growing a host cell in culture, containing such a
vector, under appropriate conditions which cause or allow
loexpression of the polypeptide. Polypeptides may also be
expressed in in vitro systems, such as reticulocyte lysate.
Systems for cloning and expression of a polypeptide in a
variety of different host cells are well known. Suitable host
l5cells include bacteria, eukaryotic cells such as mammalian and
yeast, and baculovirus systems. Mammalian cell lines
available in the art for expression of a heterologous
polypeptide include Chinese hamster ovary cells, HeLa cells,
baby hamster kidney cells, COS cells and many others. A
2ocommon, preferred bacterial host is E. coli. Suitable vectors
can be chosen or constructed, containing appropriate
regulatory sequences, including promoter sequences, terminator
fragments, polyadenylation sequences, enhancer sequences,
marker genes and other sequences as appropriate. Vectors may
25be plasmids, viral e.g. 'phage, or phagemid, as appropriate.
For further details see, for example, Molecular Cloning: a
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
52
Laboratory Manual: 2nd edition, Sambrook et al., 1989, Cold
Spring Harbor Laboratory Press. Many known techniques and
protocols for manipulation of nucleic acid, for example in
preparation of nucleic acid constructs, mutagenesis,
5sequencing, introduction of DNA into cells and gene
expression, and analysis of proteins, are described in detail
in Current Protocols in Molecular Biology, Ausubel et al.
eds., John Wiley & Sons, 1992.
loThus, a further aspect of the present invention provides a
host cell containing nucleic acid as disclosed herein. The
nucleic acid of the invention may be integrated into the
genome (e.g. chromosome) of the host cell. Integration may be
promoted by inclusion of sequences which promote recombination
l5with the genome, in accordance with standard techniques. The
nucleic acid may be on an extra-chromosomal vector within the
cell.
A still further aspect provides a method which includes
2ointroducing the nucleic acid into a_host cell. The
introduction, which may (particularly for in vitro
introduction) be generally referred to without limitation as
"transformation", may employ any available technique. For
eukaryotic cells, suitable techniques may include calcium
25phosphate transfection, DEAF-Dextran, electroporation,
liposome-mediated transfection and transduction using
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
53
retrovirus or other virus, e.g. adenovirus, vaccinia or, for
insect cells, baculovirus. For bacterial cells, suitable
techniques may include calcium chloride transformation,
electroporation and transfection using bacteriophage.
Marker genes such as antibiotic resistance or sensitivity
genes may be used in identifying clones containing nucleic
acid of interest, as is well known in the art.
loThe introduction may be followed by causing or allowing
expression from the nucleic acid, e.g. by culturing host cells
(which may include cells actually transformed although more
likely the cells will be descendants of the transformed
cells) under conditions for expression of the gene, so that
l5the encoded polypeptide is produced. If the polypeptide is
expressed coupled to an appropriate signal leader peptide it
may be secreted from the cell into the culture medium.
Following production by expression, a polypeptide may be
isolated and/or purified from the host cell and/or culture
2omedium, as the case may be, and subsequently used as desired,
e.g. in the formulation of a composition which may include one
or more additional components, such as a pharmaceutical
composition which includes one or more pharmaceutically
acceptable excipients, vehicles or carriers (e. g. see below).
Introduction of nucleic acid may take place in vivo by way of
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
54
gene therapy, as discussed below. A host cell containing
nucleic acid according to the present invention, e.g. as a
result of introduction of the nucleic acid into the cell or
into an ancestor of the cell and/or genetic alteration of the
5sequence endogenous to the cell or ancestor (which
introduction or alteration may take place in vivo or ex vivo),
may be comprised (e. g. in the soma) within an organism which
is an animal, particularly a mammal, which may be human or
non-human, such as rabbit, guinea pig, rat, mouse or other
lorodent, cat, dog, pig, sheep, goat, cattle or horse, or which
is a bird, such as a chicken. Genetically modified or
transgenic animals or birds comprising such a cell are also
provided as further aspects of the present invention.
lSThus, in various further aspects, the present invention
provides a non-human animal with a human SAPL transgene within
its genome. The transgene may have the sequence of any of the
isoforms identified herein or a mutant, derivative, allele or
variant thereof as disclosed. In one preferred embodiment,
2othe heterologous human SAPL sequence replaces the endogenous
animal sequence. In other preferred embodiments, one or more
copies of the human SAPL sequence are added to the animal
genome. Preferably the animal is a rodent, and most
preferably mouse or rat.
This may have a therapeutic aim. (Gene therapy is discussed
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
below.) The presence of a mutant, allele or variant sequence
within cells of an organism, particularly when in place of a
homologous endogenous sequence, may allow the organism to be
used as a model in testing and/or studying the role of the
5 SAPL gene or substances which modulate activity of the encoded
polypeptide and/or promoter in vitro or are otherwise
indicated to be of therapeutic potential.
An animal model for SAPL deficiency may be constructed using
lostandard techniques for introducing mutations into an animal
germ-line. In one example of this approach, using a mouse, a
vector carrying an insertional mutation within the SAPL gene
may be transfected into embryonic stem cells. A selectable
marker, for example an antibiotic resistance gene such as
l5neoR, may be included to facilitate selection of clones in
which the mutant gene has replaced the endogenous wild type
homologue. Such clones may be also be identified or further
investigated by Southern blot hybridisation. The clones may
then be expanded and cells injected into mouse blastocyst
2ostage embryos. Mice in which the injected cells have
contributed to the development of the mouse may be identified
by Southern blotting. These chimeric mice may then be bred to
produce mice which carry one copy of the mutation in the germ
line. These heterozygous mutant animals may then be bred to
25produce mice carrying mutations in the gene homozygously. The
mice having a heterozygous mutation in the SAPL gene may be a
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
56
suitable model for human individuals having one copy of the
gene mutated in the germ line who are at risk of developing
IDDM or other disease.
SAnimal models may also be useful for any of the various
diseases discussed elsewhere herein.
Instead of or as well as being used for the production of a
polypeptide encoded by a transgene, host cells may be used as
loa nucleic acid factory to replicate the nucleic acid of
interest in order to generate large amounts of it. Multiple
copies of nucleic acid of interest may be made within a cell
when coupled to an amplifiable gene such as dihyrofolate
reductase (DHFR), as is well known. Host cells transformed
l5with nucleic acid of interest, or which are descended from
host cells into which nucleic acid was introduced, may be
cultured under suitable conditions, e.g. in a fermentor, taken
from the culture and subjected to processing to purifiy the
nucleic acid. Following purification, the nucleic acid or one
2oor more fragments thereof may be used as desired, for instance
in a diagnostic or prognostic assay as discussed elsewhere
herein.
The provision of the novel SAPL polypeptide isoforms and
25mutants, alleles, variants and derivatives enables for the
first time the production of antibodies able to bind these
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
57
molecules specifically.
Accordingly, a further aspect of the present invention
provides an antibody able to bind specifically to the
5polypeptide whose sequence is given in a figure herein. Such
an antibody may be specific in the sense of being able to
distinguish between the polypeptide it is able to bind and
other human polypeptides for which it has no or substantially
no binding affinity (e. g. a binding affinity of about 1000x
ioless). Specific antibodies bind an epitope on the molecule
which is either not present or is not accessible on other
molecules. Antibodies according to the present invention may
be specific for the wild-type polypeptide. Antibodies
according to the invention may be specific for a particular
l5mutant, variant, allele or derivative polypeptide as between
that molecule and the wild-type polypeptide, so as to be
useful in diagnostic and prognostic methods as discussed
below. Antibodies are also useful in purifying the
polypeptide or polypeptides to which they bind, e.g. following
2oproduction by recombinant expression from encoding nucleic
acid.
Preferred antibodies according to the invention are isolated,
in the sense of being free from contaminants such as
25antibodies able to bind other polypeptides and/or free of
serum components. Monoclonal antibodies are preferred for
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
58
some purposes, though polyclonal antibodies are within the
scope of the present invention.
Antibodies may be obtained using techniques which are standard
Sin the art. Methods of producing antibodies include
immunising a mammal (e. g. mouse, rat, rabbit, horse, goat,
sheep or monkey) with the protein or a fragment thereof.
Antibodies may be obtained from immunised animals using any of
a variety of techniques known in the art, and screened,
lopreferably using binding of antibody to antigen of interest.
For instance, Western blotting techniques or
immunoprecipitation may be used (Armitage et al., 1992,
Nature 357: 80-82). Isolation of antibodies and/or antibody-
producing cells from an animal may be accompanied by a step of
l5sacrificing the animal.
As an alternative or supplement to immunising a mammal with a
peptide, an antibody specific for a protein may be obtained
from a recombinantly produced library of expressed
2oimmunoglobulin variable domains, e.g. using lambda
bacteriophage or filamentous bacteriophage which display
functional immunoglobulin binding domains on their surfaces;
for instance see W092/01047. The library may be naive, that
is constructed from sequences obtained from an organism which
25has not been immunised with any of the proteins (or
fragments), or may be one constructed using sequences obtained
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
59
from an organism which has been exposed to the antigen of
interest.
Suitable peptides for use in immunising an animal and/or
5isolating anti-SAPL antibody include any of the following
amino acid sequences:
HPSQEEDRHSNASQ
RIQQFDDGGSDEEDI
PESQRRSSSGSTDSE
PSSSPEQRTGQPSAPGDTS
Antibodies according to the present invention may be modified
in a number of ways. Indeed the term "antibody" should be
construed as covering any binding substance having a binding
domain with the required specificity. Thus the invention
covers antibody fragments, derivatives, functional equivalents
and homologues of antibodies, including synthetic molecules
and molecules whose shape mimics that of an antibody enabling
it to bind an antigen or epitope.
Example antibody fragments, capable of binding an antigen or
other binding partner are the Fab fragment consisting of the
VL, VH, C1 and CH1 domains; the Fd fragment consisting of the
VH and CH1 domains; the Fv fragment consisting of the VL and
25VH domains of a single arm of an antibody; the dAb fragment
which consists of a VH domain; isolated CDR regions and
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
F(ab')2 fragments, a bivalent fragment including two Fab
fragments linked by a disulphide bridge at the hinge region.
Single chain Fv fragments are also included.
5A hybridoma producing a monoclonal antibody according to the
present invention may be subject to genetic mutation or other
changes. It will further be understood by those skilled in
the art that a monoclonal antibody can be subjected to the
techniques of recombinant DNA technology to produce other
l0antibodies or chimeric molecules which retain the specificity
of the original antibody. Such techniques may involve
introducing DNA encoding the immunoglobulin variable region,
or the complementarity determining regions (CDRs), of an
antibody to the constant regions, or constant regions plus
i5framework regions, of a different immunoglobulin. See, for
instance, EP184187A, GB 2188638A or EP-A-0239400. Cloning and
expression of chimeric antibodies are described in EP-A-
0120694 and EP-A-0125023.
2oHybridomas capable of producing antibody with desired binding
characteristics are within the scope of the present
invention, as are host cells, eukaryotic or prokaryotic,
containing nucleic acid encoding antibodies (including
antibody fragments) and capable of their expression. The
25invention also provides methods of production of the
antibodies including growing a cell capable of producing the
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
61
antibody under conditions in which the antibody is produced,
and preferably secreted. -
The reactivities of antibodies on a sample may be determined
5by any appropriate means. Tagging with individual reporter
molecules is one possibility. The reporter molecules may
directly or indirectly generate detectable, and preferably
measurable, signals. The linkage of reporter molecules may be
directly or indirectly, covalently, e.g. via a peptide bond or
lonon-covalently. Linkage via a peptide bond may be as a result
of recombinant expression of a gene fusion encoding antibody
and reporter molecule.
One favoured mode is by covalent linkage of each antibody with
l5an individual fluorochrome, phosphor or laser dye with
spectrally isolated absorption or emission characteristics.
Suitable fluorochromes include fluorescein, rhodamine,
phycoerythrin and Texas Red. Suitable chromogenic dyes
include diaminobenzidine.
Other reporters include macromolecular colloidal particles or
particulate material such as latex beads that are coloured,
magnetic or paramagnetic, and biologically or chemically
active agents that can directly or indirectly cause detectable
25signals to be visually observed, electronically detected or
otherwise recorded. These molecules may be enzymes which
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
62
catalyse reactions that develop or change colours or cause
changes in electrical properties, for example. They may be
molecularly excitable, such that electronic transitions
between energy states result in characteristic spectral
5absorptions or emissions. They may include chemical entities
used in conjunction with biosensors. Biotin/avidin or
biotin/streptavidin and alkaline phosphatase detection systems
may be employed.
1o The mode of determining binding is not a feature of the
present invention and those skilled in the art are able to
choose a suitable mode according to their preference and
general knowledge. Particular embodiments of antibodies
according to the present invention include antibodies able to
l5bind and/or which bind specifically, e.g. with an affinity of
at least 10-' M, to one of the following peptides:
HPSQEEDRHSNASQ
RIQQFDDGGSDEEDI
PESQRRSSSGSTDSE
20 PSSSPEQRTGQPSAPGDTS
Antibodies according to the present invention may be used in
screening for the presence of a polypeptide, for example in a
test sample containing cells or cell lysate as discussed, and
25 may be used in purifying and/or isolating a polypeptide
according to the present invention, for instance following
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
63
production of the polypeptide by expression from encoding
nucleic acid therefor. Antibodies may modulate the activity
of the polypeptide to which they bind and so, if that
polypeptide has a deleterious effect in an individual, may be
5useful in a therapeutic context (which may include
prophylaxis).
An antibody may be provided in a kit, which may include
instructions for use of the antibody, e.g. in determining the
lOpresence of a particular substance in a test sample. One or
more other reagents may be included, such as labelling
molecules, buffer solutions, elutants and so on. Reagents may
be provided within containers which protect them from the
external environment, such as a sealed vial.
The identification of the SAPL gene and indications of its
association with IDDM and other diseases paves the way for
aspects of the present invention to provide the use of
materials and methods, such as are disclosed and discussed
2oabove, for establishing the presence or absence in a test
sample of an variant form of the gene, in particular an allele
or variant specifically associated with IDDM or other disease.
This may be for diagnosing a predisposition of an individual
to IDDM or other disease. It may be for diagnosing IDDM of a
25patient with the disease as being associated with the SAPL
gene.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
64
This allows for planning of appropriate therapeutic and/or
prophylactic treatment, permitting stream-lining of treatment
by targeting those most likely to benefit.
5A variant form of the gene may contain one or more insertions,
deletions, substitutions and/or additions of one or more
nucleotides compared with the wild-type sequence (such as
shown in Table 2) which may or may not disrupt the gene
function. Differences at the nucleic acid level are not
lonecessarily reflected by a difference in the amino acid
sequence of the encoded polypeptide. However, a mutation or
other difference in a gene may result in a frame-shift or stop
codon, which could seriously affect the nature of the
polypeptide produced (if any), or a point mutation or gross
l5mutational change to the encoded polypeptide, including
insertion, deletion, substitution and/or addition of one or
more amino acids or regions in the polypeptide. A mutation in
a promoter sequence or other regulatory region may prevent or
reduce expression from the gene or affect the processing or
2ostability of the mRNA transcript. For instance, a sequence
alteration may affect alternative splicing of mRNA. As
discussed, various SAPL isoforms resulting from alternative
splicing are provided by the present invention.
25There are various methods for determining the presence or
absence in a test sample of a particular nucleic acid
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
sequence, such as the sequence shown in any figure herein, or
a mutant, variant or allele thereof, e.g. including an
alteration shown in Table 2.
5Tests may be carried out on preparations containing genomic
DNA, cDNA and/or mRNA. Testing cDNA or mRNA has the advantage
of the complexity of the nucleic acid being reduced by the
absence of intron sequences, but the possible disadvantage of
extra time and effort being required in making the
lopreparations. RNA is more difficult to manipulate than DNA
because of the wide-spread occurrence of RN'ases. Nucleic
acid in a test sample may be sequenced and the sequence
compared with the sequence shown in any of the figures herein,
to determine whether or not a difference is present. If so,
l5the difference can be compared with known susceptibility
alleles (e.g. as shown in Table 2) to determine whether the
test nucleic acid contains one or more of the variations
indicated, or the difference can be investigated for
association with IDDM or other disease.
Since it will not generally be time- or labour-efficient to
sequence all nucleic acid in a test sample or even the whole
SAPL gene, a specific amplification reaction such as PCR using
one or more pairs of primers may be employed to amplify the
25region of interest in the nucleic acid, for instance the SAPL
gene or a particular region in which polymorphisms associated
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
66
with IDDM or other disease susceptibility occur. The
amplified nucleic acid may then be sequenced as above, and/or
tested in any other way to determine the presence or absence
of a particular feature. Nucleic acid for testing may be
5prepared from nucleic acid removed from cells or in a library
using a variety of other techniques such as restriction enzyme
digest and electrophoresis.
Nucleic acid may be screened using a variant- or allele-
ZOspecific probe. Such a probe corresponds in sequence to a
region of the SAPL gene, or its complement, containing a
sequence alteration known to be associated with IDDM or other
disease susceptibility. Under suitably stringent conditions,
specific hybridisation of such a probe to test nucleic acid is
l5indicative of the presence of the sequence alteration in the
test nucleic acid. For efficient screening purposes, more
than one probe may be used on the same test sample.
Allele- or variant-specific oligonucleotides may similarly be
2oused in PCR to specifically amplify.particular sequences if
present in a test sample. Assessment of whether a PCR band
contains a gene variant may be carried out in a number of ways
familiar to those skilled in the art. The PCR product may for
instance be treated in a way that enables one to display the
25polymorphism on a denaturing polyacrylamide DNA sequencing
gel, with specific bands that are linked to the gene variants
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
67
being selected.
SSCP heteroduplex analysis may be used for screening DNA
fragments for sequence variants/mutations. It generally
5involves amplifying radiolabelled 100-300 by fragments of the
gene, diluting these products and denaturing at 95°C. The
fragments are quick-cooled on ice so that the DNA remains in
single stranded form. These single stranded fragments are run
through acrylamide based gels. Differences in the sequence
locomposition will cause the single stranded molecules to adopt
difference conformations in this gel matrix making their
mobility different from wild type fragments, thus allowing
detecting of mutations in the fragments being analysed
relative to a control fragment upon exposure of the gel to X-
l5ray film. Fragments with altered mobility/conformations may be
directly excised from the gel and directly sequenced for
mutation.
Sequencing of a PCR product may involve precipitation with
2oisopropanol, resuspension and sequencing using a TaqFS+ Dye
terminator sequencing kit. Extension products may be
electrophoresed on an ABI 377 DNA sequencer and data analysed
using Sequence Navigator software.
25A further possible screening approach employs a PTT assay in
which fragments are amplified with primers that contain the
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
68
consensus Kozak initiation sequences and a T7 RNA polymerase
promoter. These extra sequences are incorporated into the 5'
primer such that they are in frame with the native coding
sequence of the fragment being analysed. These PCR products
5are introduced into a coupled transcription/translation
system. This reaction allows the production of RNA from the
fragment and translation of this RNA into a protein fragment.
PCR products from controls make a protein product of a wild
type size relative to the size of the fragment being analysed.
loIf the PCR product analysed has a frame-shift or nonsense
mutation, the assay will yield a truncated protein product
relative to controls. The size of the truncated product is
related to the position of the mutation, and the relative
region of the gene from this patient may be sequenced to
l5identify the truncating mutation.
An alternative or supplement to looking for the presence of
variant sequences in a test sample is to look for the presence
of the normal sequence, e.g. using a suitably specific
2ooligonucleotide probe or primer. Use of oligonucleotide
probes and primers has been discussed in more detail above.
Allele- or variant-specific oligonucleotide probes or primers
according to embodiments of the present invention may be
25selected from those shown in Table 1 and modified versions
thereof.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
69
Approaches which rely on hybridisation between a probe and
test nucleic acid and subsequent detection of a mismatch may
be employed. Under appropriate conditions (temperature, pH
etc.), an oligonucleotide probe will hybridise with a sequence
5which is not entirely complementary. The degree of base-
pairing between the two molecules will be sufficient for them
to anneal despite a mis-match. Various approaches are well
known in the art for detecting the presence of a mis-match
between two annealing nucleic acid molecules.
to
For instance, RN'ase A cleaves at the site of a mis-match.
Cleavage can be detected by electrophoresing test nucleic acid
to which the relevant probe or probe has annealed and looking
for smaller molecules (i.e. molecules with higher
i5electrophoretic mobility) than the full length probe/test
hybrid.
Thus, an oligonucleotide probe that has the sequence of a
region of the normal SAPL gene (either sense or anti-sense
2ostrand) in which mutations associated with LDDM or other
disease susceptibility are known to occur (e.g. see Table 2)
may be annealed to test nucleic acid and the presence or
absence of a mis-match determined. Detection of the presence
of a mis-match may indicate the presence in the test nucleic
25acid of a mutation associated with IDDM or other disease
susceptibility. On the other hand, an oligonucleotide probe
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
that has the sequence of a region of the gene including a
mutation associated with IDDM or other disease susceptibility
may be annealed to test nucleic acid and the presence or
absence of a mis-match determined. The presence of a mis-
5match may indicate that the nucleic acid in the test sample
has the normal sequence (the absence of a mis-match indicating
that the test nucleic acid has the mutation). In either case,
a battery of probes to different regions of the gene may be
employed.
The presence of differences in sequence of nucleic acid
molecules may be detected by means of restriction enzyme
digestion, such as in a method of DNA fingerprinting where the
restriction pattern produced when one or more restriction
lsenzymes are used to cut a sample of nucleic acid is compared
with the pattern obtained when a sample containing the normal
gene shown in a figure herein or a variant or allele, e.g. as
containing an alteration shown in Table 2, is digested with
the same enzyme or enzymes.
The presence or absence of a lesion in a promoter or other
regulatory sequence may also be assessed by determining the
level of mRNA production by transcription or the level of
polypeptide production by translation from the mRNA.
Determination of promoter activity has been discussed above.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
71
A test sample of nucleic acid may be provided for example by
extracting nucleic acid from cells or biological tissues or
fluids, urine, saliva, faeces, a buccal swab, biopsy or
preferably blood, or for pre-natal testing from the amnion,
5placenta or foetus itself.
Screening for the presence of one or more amino acid sequence
variants in a test sample has a diagnostic and/or prognostic
use, for instance in determining IDDM or other disease
losusceptibility.
There are various methods for determining the presence or
absence in a test sample of a particular polypeptide, such as
the polypeptide with the amino acid sequence shown in any
l5figure herein or an amino acid sequence mutant, variant or
allele thereof.
A sample may be tested for the presence of a binding partner
for a specific binding member such as an antibody (or mixture
2oof antibodies), specific for one or more particular variants
of the polypeptide shown in a figure herein. A sample may be
tested for the presence of a binding partner for a specific
binding member such as an antibody (or mixture of antibodies),
specific for the polypeptide shown in a figure herein. In
25such cases, the sample may be tested by being contacted with a
specific binding member such as an antibody under appropriate
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
72
conditions for specific binding, before binding is determined,
for instance using a reporter system as discussed. Where a
panel of antibodies is used, different reporting labels may be
employed for each antibody so that binding of each can be
determined.
A specific binding member such as an antibody may be used to
isolate and/or purify its binding partner polypeptide from a
test sample, to allow for sequence and/or biochemical analysis
l0of the polypeptide to determine whether it has the 'sequence
and/or properties of the polypeptide whose sequence is
disclosed herein, or if it is a mutant or variant form. Amino
acid sequence is routine in the art using automated sequencing
machines.
A test sample containing one or more polypeptides may be
provided for example as a crude or partially purified cell or
cell lysate preparation, e.g. using tissues or cells, such as
from saliva, faeces, or preferably blood, or for pre-natal
2otesting from the amnion, placenta or foetus itself.
Whether it is a polypeptide, antibody, peptide, nucleic acid
molecule, small molecule or other pharmaceutically useful
compound according to the present invention that is to be
25given to an individual, administration is preferably in a
"prophylactically effective amount" or a "therapeutically
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
73
effective amount" (as the case may be, although prophylaxis
may be considered therapy), this being sufficient to show
benefit to the individual. The actual amount administered,
and rate and time-course of administration, will depend on the
5nature and severity of what is being treated. Prescription of
treatment, e.g. decisions on dosage etc, is within the
responsibility of general practioners and other medical
doctors.
1oA composition may be administered alone or in combination with
other treatments, either simultaneously or sequentially
dependent upon the condition to be treated.
Pharmaceutical compositions according to the present
l5invention, and for use in accordance with the present
invention, may include, in addition to active ingredient, a
pharmaceutically acceptable excipient, carrier, buffer,
stabiliser or other materials well known to those skilled in
the art. Such materials should be non-toxic and should not
2ointerfere with the efficacy of the active ingredient. The
precise nature of the carrier or other material will depend on
the route of administration, which may be oral, or by
injection, e.g. cutaneous, subcutaneous or intravenous.
25 Pharmaceutical compositions for oral administration may be in
tablet, capsule, powder or liquid form. A tablet may include
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
74
a solid carrier such as gelatin or an adjuvant. Liquid
pharmaceutical compositions generally include a liquid carrier
such as water, petroleum, animal or vegetable oils, mineral
oil or synthetic oil. Physiological saline solution, dextrose
5or other saccharide solution or glycols such as ethylene
glycol, propylene glycol or polyethylene glycol may be
included.
For intravenous, cutaneous or subcutaneous injection, or
loinjection at the site of affliction, the active ingredient
will be in the form of a parenterally acceptable aqueous
solution which is pyrogen-free and has suitable pH,
isotonicity and stability. Those of relevant skill in the art
are well able to prepare suitable solutions using, for
l5example, isotonic vehicles such as Sodium Chloride Injection,
Ringer's Injection, or Lactated Ringer's Injection.
Preservatives, stabilisers, buffers, antioxidants and/or other
additives may be included, as required.
2oTargeting therapies may be used to deliver the active agent
more specifically to certain types of cell, by the use of
targeting systems such as antibody or cell specific ligands.
Targeting may be desirable for a variety of reasons; for
example if the agent is unacceptably toxic, or if it would
25otherwise require too high a dosage, or if it would not
otherwise be able to enter the target cells.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
Instead of administering an agent directly, it may be be
produced in target cells by expression from an encoding gene
introduced into the cells, e.g. in a viral vector (see below).
The vector may be targeted to the specific cells to be
5treated, or it may contain regulatory elements which are
switched on more or less selectively by the target cells.
Viral vectors may be targeted using specific binding
molecules, such as a sugar, glycolipid or protein such as an
antibody or binding fragment thereof. Nucleic acid may be
lotargeted by means of linkage to a protein ligand (such as an
antibody or binding fragment thereof) via polylysine, with the
ligand being specific for a receptor present on the surface of
the target cells.
l5An agent may be administered in a precursor form, for
conversion to an active form by an activating agent produced
in, or targeted to, the cells to be treated. This type of
approach is sometimes known as ADEPT or VDEPT; the former
involving targeting the activating agent to the cells by
2o conjugation to a cell-specific antibody, while the latter
involves producing the activating agent, e.g. an enzyme, in_a
vector by expression from encoding DNA in a viral vector (see
for example, EP-A-415731 and WO 90/07936).
25Nucleic acid according to the present invention, e.g. encoding
the authentic biologically active SAPL polypeptide or a
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
76
functional fragment thereof, may be used in a method of gene
therapy, to treat a patient who is unable to synthesize the
active polypeptide or unable to synthesize it at the normal
level, thereby providing the effect provided by the wild-type
5with the aim of treating and/or preventing one or more
symptoms of IDDM and/or one or more other diseases.
Vectors such as viral vectors have been used to introduce
genes into a wide variety of different target cells.
loTypically the vectors are exposed to the target cells so that
transfection can take place in a sufficient proportion of the
cells to provide a useful therapeutic or prophylactic effect
from the expression of the desired polypeptide. The
transfected nucleic acid may be permanently incorporated into
l5the genome of each of the targeted cells, providing long
lasting effect, or alternatively the treatment may have to be
repeated periodically.
A variety of vectors, both viral vectors and plasmid vectors,
2oare known in the art, see e.g. US Patent No. 5,252,479 and WO
93/07282. In particular, a number of viruses have been used
as gene transfer vectors, including adenovirus, papovaviruses,
such as SV40, vaccinia virus, herpesviruses, including HSV and
EBV, and retroviruses, including gibbon ape leukaemia virus,
25 Rous Sarcoma Virus, Venezualian equine enchephalitis virus,
Moloney murine leukaemia virus and murine mammary tumourvirus.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
77
Many gene therapy protocols in the prior art have used
disabled murine retroviruses.
Disabled virus vectors are produced in helper cell lines in
5which genes required for production of infectious viral
particles are expressed. Helper cell lines are generally
missing a sequence which is recognised by the mechanism which
packages the viral genome and produce virions which contain no
nucleic acid. A viral vector which contains an intact
lopackaging signal along with the gene or other sequence to be
delivered (e. g. encoding the SAPL polypeptide or a fragment
thereof) is packaged in the helper cells into infectious
virion particles, which may then be used for the gene
delivery.
Other known methods, of introducing nucleic acid into cells
include electroporation, calcium phosphate co-precipitation,
mechanical techniques such as microinjection, transfer
mediated by liposomes and direct DNA uptake and receptor-
2omediated DNA transfer. Liposomes can encapsulate RNA, DNA and
virions for delivery to cells. Depending on factors such as
pH, ionic strength and divalent cations being present, the
composition of liposomes may be tailored for targeting of
particular cells or tissues. Liposomes include phospholipids
25and may include lipids and steroids and the composition of
each such component may be~altered. Targeting of liposomes
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
78
may also be achieved using a specific binding pair member such
as an antibody or binding fragment thereof, a sugar or a
glycolipid.
5The aim of gene therapy using nucleic acid encoding the
polypeptide, or an active portion thereof, is to increase the
amount of the expression product of the nucleic acid in cells
in which the level of the wild-type polypeptide is absent or
present only at reduced levels. Such treatment may be
lotherapeutic or prophylactic, particularly in the treatment of
individuals known through screening or testing to have an
IDDM4 susceptibility allele and hence a predisposition to the
disease.
l5Similar techiques may be used for anti-sense regulation of
gene expression, e.g. targeting an antisense nucleic acid
molecule to cells in which a mutant form of the gene is
expressed, the aim being to reduce production of the mutant
gene product. Other approaches to specific down-regulation of
2ogenes are well known, including the use of ribozymes designed
to cleave specific nucleic acid sequences. Ribozymes are
nuceic acid molecules, actually RNA, which specifically cleave
single-stranded RNA, such as mRNA, at defined sequences, and
their specificity can be engineered. Hammerhead ribozymes may
25be preferred because they recognise base sequences of about
11-18 bases in length, and so have greater specificity than
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
79
ribozymes of the Tetrahymena type which recognise sequences of
about 4 bases in length, though the latter type of ribozymes
are useful in certain circumstances. References on the use of
ribozymes include Marschall, et al. Cellular and Molecular
5Neurobiology, 1994. 14(5): 523; Hasselhoff, Nature 334: 585
(1988) and Cech, J. Amer. Med. Assn., 260: 3030 (1988).
Aspects of the present invention will now be illustrated with
reference to the accompanying figures described already above
b and experimental exemplification, by way of example and not
limitation. Further aspects and embodiments will be apparent
to those of ordinary skill in the art. All documents
mentioned in this specification are hereby incorporated herein
by reference.
EXAMPLE 1
IDENTIFICATION OF IDDM4 EST4 (SAPL)
Construction of Libraries for Shotgun Sequencing
2oDNA was prepared from BAC (Bacteria-1 Artificial Chromosomes)
clones 14-1-15 and 25-e-5. Cells containing either BAC vector
were streaked on Luria-Bertani (LB)agar plates supplemented
with the appropriate antibiotic. A single colony was used to
inoculate 200 ml of LB media supplemented with the appropriate
25antibiotic and grown overnight at 37°C. The cells were
pelleted by centrifugation and plasmid DNA was prepared by
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
following the QIAGEN (Chatsworth, CA) Tip500 Maxi
plasmid/cosmid purification protocol with the following
modifications; the cells from 100 ml of culture were used for
each Tip500 column, the NaCl concentration of the elution
5buffer was increased from 1.25M to 1,7M, and the elution
buffer was heated to 65° C.
Purified BAC and PAC DNA was digested with Not I restriction
endonuclease and then subjected to pulse field gel
l0electrophoresis using a BioRad CHEF Mapper system. (Richmond,
CA). The digested DNA was electrophoresed overnight in a to
low melting temperature agarose (BioRad, Richmond CA) gel that
was prepared with 0.5X Tris Borate EDTA (10X stock solution,
Fisher, Pittsburgh, PA ). The CHEF Mapper autoalgorithm
15 default settings were used for switching times and voltages.
Following electrophoresis the gel was stained with ethidium
bromide (Sigma, St. Louis, MO) and visualized with a
ultraviolet transilluminator. The insert bands) was excised
from the gel. The DNA was eluted from the gel slice by beta-
2oAgarase (New England Biolabs, Beverly MA) digestion according
to the manufacturer's instructions. The solution containing
the DNA and digested agarose was brought to 50 mM Tris pH 8.0,
15 mM MgCl2, and 25o glycerol in a volume of 2 ml and placed
in a AERO-MIST nebulizer (CIS-US, Bedford MA). The nebulizer
25was attached to a nitrogen gas source and the DNA was randomly
sheared at 10 psi for 30 sec.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
81
The sheared DNA was ethanol precipitated and resuspended in TE
(10 mM Tris, 1 mM EDTA). The ends were made blunt by
treatment with Mung Bean Nuclease (Promega, Madison, WI) at 30°
C for 30 min, followed by phenol/chloroform extraction, and
5treatment with T4 DNA polymerase (GIBCO/BRL, Gaithersburg, MD)
in multicore buffer (Promega, Madison, WI) in the presence of
40 uM dNTPs at 16 °C. To facilitate subcloning of the DNA
fragments, BstX I adapters (Invitrogen, Carlsbad, CA) were
ligated to the fragments at 14 °C overnight with T4 DNA lipase
(Promega, Madison WI). Adapters and DNA fragments less than
500 by were removed by column chromatography using a cDNA
sizing column (GIBCO/BRL, Gaithersburg, MD) according to the
instructions provided by the manufacturer. Fractions
containing DNA greater than 1 kb were pooled and concentrated
l5by ethanol precipitation. The DNA fragments containing BstX I
adapters were ligated into the BstX I sites of pSHOT II which
was constructed by subcloning the BstX I sites from pcDNA II
(Invitrogen, Carlsbad, CA) into the BssH II sites of
pBlueScript (Stratagene, La Jolla, CA). pSHOT II was prepared
2oby digestion with BstX I restriction endonuclease and purified
by agarose gel electrophoresis. The gel purified vector DNA
was extracted from the agarose by following the Prep-A-Gene
(BioRad, Richmond, CA) protocol. To reduce ligation of the
vector to itself, the digested vector was treated with calf
25intestinal phosphatase (GIBCO/BRL, Gaithersburg, MD. Ligation
reactions of the DNA fragments with the cloning vector were
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
82
transformed into ultra-competent XL-2 Blue cells (Stratagene,
La Jolla, CA), and plated on LB agar plates supplemented with
100 ug/ml ampicillin. Individual colonies were picked into a
96 well plate containing 100 ul/well of LB broth supplemented
5with ampicillin and grown overnight at 37 °C. Approximately
25 u1 of 80o sterile glycerol was added to each well and the
cultures stored at -80 °C.
Preparation of plasmid DNA
loGlycerol stocks were used to inoculate 5 ml of LB broth
supplemented with 100 ug/ml ampicillin either manually or by
using a Tecan Genesis RSP 150 robot (Tecan AG, Hombrechtikon,
Switzerland) programmed to inoculate 96 tubes containing 5 ml
broth from the 96 wells. The cultures were grown overnight at
1537° C with shaking to provide aeration. Bacterial cells were
pelleted by centrifugation, the supernatant decanted, and the
cell pellet stored at -20 °C. Plasmid DNA was prepared with a
QIAGEN Bio Robot 9600 (QIAGEN, Chatsworth CA) according to the
Qiawell Ultra protocol. To test the frequency and size of
2oinserts plasmid DNA was digested with the restriction
endonuclease Pvu II. The size of the restriction endonuclease
products was examined by agarose gel electrophoresis with the
average insert size being 1 to 2 kb.
25 DNA Sequence Analysis of Shofgun clones
DNA sequence analysis was performed using the ABI PRTSM''" dye
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
83
terminator cycle sequencing ready reaction kit with AmpliTaq
DNA polymerase, FS (Perkin Elmer, Norwalk, CT). DNA sequence
analysis was performed with M13 forward and reverse primers.
Following amplification in a Perkin-Elmer 9600 the extension
5products were purified and analyzed on an ABI PRISM 377
automated sequencer (Perkin Elmer, Norwalk, CT).
Approximately 12 to 15 sequencing reactions were performed per
kb of DNA to be examined e.g. 1500 reactions would be
Zoperformed for a BAC insert of 100 kb.
Assembly of DNA sequences
Phred/Phrap was used for DNA sequences assembly. This program
was developed by Dr. Phil Green and licensed from the
l5University of Washington (Seattle, WA). Phred/Phrap consists
of the following programs: Phred for base-calling, Phrap for
sequence assembly, Crossmatch for sequence comparisons, Consed
and Phrapview for visualization of data, Repeatmasker for
screening repetitive sequences. Vector and E. coli.DNA
2osequences were identified by Crossmatch and removed from the
DNA sequence assembly process. DNA sequence assembly was on a
SUN Enterprise 4000 server running Solaris 2.51 operating
system (Sun Microsystems Inc., Mountain View, CA). The
sequence assemblies were further analyzed using Consed and
25 Phrapview.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
84
Biolnformatic Analysis of Assembled DNA Sequences
The DNA sequences at various stages of assembly were queried
against the DNA sequences in the GenBank database (subject)
using the BLAST algorithm (S.F. Altschul, et a1. (1990) J.
5Mol. Biol. 215, 403-410). When examining large contiguous
sequences of DNA repetitive elements were masked following
identification by Crossmatch with a database of mammalian
repetitive elements. Following BLAST analysis the results
were compiled by a parser program. The parser provided the
lofollowing information from the database for each DNA sequence
having a similarity with a P value greater than 10-6; the
annotated name of the sequence, the database from which it was
derived, the length and percent identity of the region of
similarity, and the location of the similarity in both the
15 query and the subject.
Analysis of DNA sequences from BAC 14-1-15 revealed an EST
aa194169 which was 91o identical over 60 nucleotides. Several
lines of evidence indicated that this was an authentic mRNA
2otranscript and that this EST represented the 5' most portion
of that mRNA transcript. The first piece of evidence was
revealed by comparing sequences obtained from a mouse BAC
clone 53-d-8 that is syntenic with BAC 14-1-15 (Figure 4).
The human genomic DNA corresponding to EST aa194169 was
25 conserved 880 over 43 by with the mouse genomic DNA. This
region of human genomic DNA exhibited a relatively high score
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
in the promoter prediction algorithm PROMOTERSCAN (Prestridge
(1995) J. Mol Biol. 249: 923-932) and the presence of a
cluster of DNA sequences that are predicted to serve as
transcription factor binding sites (Figure 4). This region of
5human genomic DNA was predicted to be a CpG island, which is
often associated with the 5' end genes. These sequences lie
approximately 11 kb downstream of the 3' end of a gene, LRPS,
that we have previously characterized. Finally, DNA sequences
obtained from BAC clone 25-e-5 revealed additional genomic
losequences that were represented by EST aa194169 indicating the
presence of an intron located between two exons. Together
these data support the 5' portion of IDDM4 EST4 corresponding
to aa194169 and that this EST sequence is derived from an
authentic mRNA transcript. To isolate the open reading frame
l5for this gene, RCCA analysis was focused on extending 3'.
Extension of IDDM4 EST4 by RCCA
The full length cDNA of one aspect of the present invention
was generated by a method of cDNA screening called Reduced
2oComplexity cDNA Analysis (RCCA). Briefly, the extension of
partial cDNA sequences have historically been achieved with
one or both of the two commonly used methods: filter screening
of cDNA libraries by hybridization with labeled probes, and 5'-
and 3'-RACE with total cellular mRNA by PCR. The first method
25is effective but laborious and slow while the latter method is
fast but limited in efficiency. This RACE protocol is
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
86
hindered by limited length of extension due to the use of the
entire cellular mRNA population in a single reaction. Since
smaller fragments are amplified much more efficiently than
larger fragments by PCR in the same reaction, PCR products
5obtained using the second method are often quite small.
The RCCA method improves upon known methods of cDNA library
screening by initially constructing and subdividing cDNA
libraries followed by isolating 5'- and 3'- flanking fragments
loby PCR. Since each pool is unlikely to contain more than one
clone for a given gene which is low to moderately expressed,
competition between large and small PCR products in one pool
does not exist, making it possible to isolate fragments of
various sizes. One definite advantage of the method as
15 described herein is the efficiency, throughput, and its
potential to isolate alternatively spliced cDNA forms.
The RCCA process provides for rapid extension of a partial
cDNA sequence based on subdividing a primary cDNA library and
20 DNA amplification by polymerase chain reaction (PCR). A cDNA
library is constructed with cDNA primed by random, oligo-dT or
a combination of both random and oligo-dT primers and then
subdivided into pools at approximately 10,000 -20,000 clones
per pool that are stored in a 96-well plate. Each pool (well)
25 is amplified separately and therefore represents an
independent portion of the cDNA molecules from the original
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
87
mRNA source.
The fundamental principle of the RCCA process is to subdivide
a complex library into superpools of about 10,000 to about
520,000 clones. A library of two million primary clones, a
number large enough to cover most mRNA transcripts expressed
in the tissue, can be subdivided into 188 pools and stored in
two 96-well plates. Since the number of transcripts for most
genes is fewer than one copy per 10,000 transcripts in total
locellular mRNA, each pool is unlikely to contain more than one
clone for a given cDNA sequence. Such reduced complexity
makes it possible to use PCR to isolate flanking fragments of
partial cDNA sequences larger than those obtained by known
methods.
The skilled artisan, aided with this specification, will
understand the far reaching cDNA cloning process disclosed
herein: multiple primer combinations from an EST or other
partial cDNA sequence, in combination with flanking vector
2oprimer oligonucleotides can be used to "walk" in both-
directions away from the internal, gene specific, sequence,
and respective primers, such that a contig representing a full
length cDNA can be constructed. In this particular case, the
5' end of the cDNA was represented by an EST with GenBank
25accession number aa194169. Therefore the RCCA procedure was
only employed to obtain sequences 3' of EST aa194169. This
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
88
procedure relies on the ability to screen multiple pools which
comprise a representative portion of the total cDNA library.
This procedure is not dependent upon using a cDNA library with
directionally cloned inserts. Instead, both 5' and 3' vector
Sand gene specific primers are added and a contig map is
constructed from additional screening of positive pools using
both vector primers and gene specific primers. Of course,
these gene specific primers are initially constructed from a
known nucleic acid fragment such as an expressed sequence tag.
loHowever, as the walk continues, gene specific primers are
utilized from the 5' and 3' boundaries of the newly identified
regions of the cDNA. As the walk continues, there is still no
requirement that the vector orientation of a yet unidentified
fragment be known. Instead, all combinations are tested on a
i5positive pool and the actual vector orientation is determined
by the ability of certain vector/gene specific primers to
generate the predicted PCR fragment. A full-length cDNA may
then be constructed by known subcloning procedures.
2oRCCA was used to extend the partial cDNA sequence originally
identified by similarity between EST aa194169 and BAC 14-1-15
genomic DNA sequences. Positive pools containing the cDNA
sequence were identified by PCR using a pair of primers,
4dest4 3f and 4dest4 1r, at a final concentration of 0.15 uM,
25which generate a PCR product of 377 nucleotides. This product
was obtained by 40 amplification cycles of denaturation at 94°C
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
89
for 30 seconds, primer annealing at 60°C for 30 sec, and
product extension at 68°C for 1 min. The DNA polymerase used
for PCR amplification was the enzyme TaqGold (Perkin-Elmer,
Norwalk CT) and the reaction volume was 10 u1. The PCR
5template was RCCA pools from size selected libraries >2.5 kb
from prostate and testis. Positive pools were identified by
detection of the 377 by product by agarose (20) gel
electrophoresis. Each positive pool in the library contains
an independent clone of the cDNA sequence; within each clone
b are embedded the partial cDNA sequence and its flanking
fragments. The flanking fragments are isolated by PCR with
primers complementary to the known vector and cDNA sequences
and then sequenced directly. To extend the cDNA clone in the
3' direction the primers 4dest4 3f and 4dest4 6f were used in
l5combination with the vector primers, 5438 and 873F, in a
primary reaction using Taqara LA (Panvera, Madison, WI). The
amplification conditions were 20 cycles of denaturation at 94
°C for 30 sec, primer annealing at 60 °C for 30 sec and
extension for 4 min at 68 °C in a 10 u1 reaction volume. The
2oprimary reaction was diluted by adding 9 parts water and an
aliquot was removed for a second PCR reaction containing
primer 4dest 7f (4dest4 6f primary reactions) and 4dest4f (for
4dest4 3f primary reactions). The secondary reactions were
amplified 25 cycles using Taqara LA as described above. The
25 DNA sequences from these fragments were assembled with
original partial cDNA sequence to generate a continuous cDNA
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
fragment of 2482 nucleotides. The longest clone obtained by
RCCA provided an extension of 1.8 kb. The cDNA sequence of
2482 nucleotides was used to search the GenBank database using
the BLAST algorithm. This resulted in the identification of a
SUnigene EST cluster represented by GenBank Accession number
aa193106. A number of EST sequences that were present in this
Unigene cluster were assembled to produce approximately 1.4 kb
of cDNA sequence. PCR primers were then designed based on the
Unigene cluster to link these sequences to the DNA sequences
l0identified by RCCA. This resulted in the identification of
4.8 kb of cDNA sequence that contains an open reading frame of
2382 nucleotides which encodes a protein of 794 amino acids.
One of the RCCA clones, clone 33, diverged from the other
l5sequences after nucleotide 2608 to form isoform(b). The
divergent sequence in isoform(b) is identical to isoform(a)
from nucleotide 4172 to 4682. Therefore this sequence likely
represents an alternatively spliced mRNA transcript in which
isoform(a) nucleotides 2609-4171 are missing. Isoform (b)
2ocontains an open reading frame which encodes a protein of 791
amino acids of which the first 776 amino acids are identical
to isoform (a).
Identification of polymorphisms in SAPL
25The process of RCCA generates clones that may differ in
origin, i.e. the mRNA used to synthesize the cDNA, e.g.
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
91
testis, mRNA may be derived from an individual heterozygous
for the SAPL locus or may be from a pool of different
individuals. Therefore polymorphisms between different RCCA
clones may represent true differences, alternatively these
differences may arise from PCR mistakes or from errors that
are made by the DNA polymerase during the propagation of
vectors containing SAPL inserts in E. coli. To discriminate
against these types of errors polymorphisms were only noted
when detected in more than one clone and where the sequence
l0 quality was excellent. All the candidate polymorphism's that
were detected lie in the putative 3' untranslated portion of
the cDNA and thus have no effect on the encoded protein.
Northern Blot Analysis
Primers 4dest4 2f and 4dest4 2r (Table 2) were used to amplify
a PCR product of 957 by from placenta, testis, thymus or lymph
node cDNA. This products were purified on an agarose gel, the
DNA extracted, and subcloned into pCR2.1 (Invitrogen,
Carlsbad, CA). The 957 by probe was labeled by random priming
2owith the Amersham Rediprime kit (Arlington Heights, IL) in the
presence of 50-100 uCi of 3000 Ci/mmole [alpha 32P]dCTP
(Dupont/NEN, Boston, MA). Unincorporated nucleotides were
removed with a ProbeQuant G-50 spin column (Pharmacia/Biotech,
Piscataway, NJ). The radiolabeled probe at a concentration of
25greater than 1 x 106 cpm/ml in rapid hybridization buffer
(Clontech, Palo Alto, CA) was incubated overnight at 65°C with
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
92
human multiple tissue Northern's I and II (Clontech, Palo
Alto, CA). The blots were washed by two 15 min incubations in
2X SSC, O.lo SDS (prepared from 20X SSC and 20 % SDS stock
solutions, Fisher, Pittsburg, PA) at room temperature,
5followed by two 15 min incubations in 1X SSC, O.lo SDS at room
temperature, and two 30 min incubations in 0.1X SSC, 0.1o SDS
at 60°C. Autoradiography of the blots was done to visualize
the bands that specifically hybridized to the radiolabeled
probe.
l0
The expression pattern in a number of tissues was examined by
Northern blot analysis. Two distinct bands were detected, one
of approximately 4.9 kb and the second of approximately 4.1
kb. In most tissues the predominant band is the larger 4.9 kb
l5band, however, in the testis the lower band of approximately
4.1 kb is the predominant one. This lower band may indicate
an alternatively spliced form that differs from SAPLb which
may be investigated in the testis. The first band likely
corresponds to the SAPLa cDNA for which the sequence of 4793
2onucleotides has been determined. The second band may..
correspond to SAPLb for which the sequence of 3228 nucleotides
has been determined. Alternatively, the approximately 4.1 kb
band may correspond to an as yet unidentified alternatively
spliced form, in which case SAPLb would be a rare
25alternatively spliced transcript. The highest level of SAPL
expression is seen in skeletal muscle, placenta, heart,
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
93
pancreas and testis. Detectable expression is also observed
in brain, lung, liver, kidney, spleen, thymus, prostate, small
intestine, colon, and leukocytes. No detectable expression is
seen in ovary.
Identification of intron/exon boundaries for SAPL
The program Crossmatch which uses the Smith-Waterman algorithm
was used to compare SAPL cDNA sequences with BAC 14-1-15 and
BAC 25-e-5 genomic sequences. This identifies the boundaries
lofor first five exons of SAPL which correspond to the first 865
nucleotides of the cDNA sequence (Table 3).
Isolation of other species homologs of SAPL gene
The SAPL genes from different species, e.g. rat, dog, are
l5isolated by screening of a cDNA library with portions of the
gene that have been obtained from cDNA of the species of
interest using PCR primers designed from the human sequence.
Degenerate PCR is performed by designing primers of 17-20
nucleotides with 32-128 fold degeneracy by selecting regions
2othat code for amino acids that have low codon degeneracy e.g.
Met and Trp. When selecting these primers preference is given
to regions that are conserved in the protein e.g. the motifs
shown herein. PCR products are analyzed by DNA sequence
analysis to confirm their similarity to the human sequence.
25 The correct product is used to screen cDNA libraries by colony
or plaque hybridization at high stringency. Alternatively
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
94
probes derived directly from the human gene are utilized to
isolate the cDNA sequence of SAPT from different species by
hybridization at reduced stringency.
Use of the SAPL cDNA sequence to search the GenBank database
using the FASTA algorithm revealed mouse EST AA684416 which is
93o identical to the SAPL cDNA sequence from 590 to 1080. This
is likely the mouse ortholog of human SAPL. It and other
mouse ESTs such as aa435418, which is 86o identical from 2888
loto 3348, are used in the isolation of the mouse SAPL cDNA
either by a PCR based or nucleic acid hybridization based
strategy.
EXAMPLE 2
lSAssociation with diabetes.
Type I diabetes is a multifactorial disorder, with the genetic
component being oligo- or polygenic. Two loci have been
identified as conferring susceptibility to typ 1 diabetes by
2ocandidate gene approaches. The main locus is encoded by the
major histocompatibility complex (MHC) on chromosome 6p
(IDDMl) (Morton, N., et al. (1983) AM J HUM GENET 35, 201-213;
Todd, J and Farrall, M. (1996) Hum Mol Genets, 1443-1448) with
the second locus, IDDM2, the insulin minisatellite or variable
25number of tandem repeats (VNTR) on chromosome llp (Bennett,
S., et al (1995) Nature Genet 9, 284-292). These two loci
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
alone, however, cannot account for the observed degree of
familial clustering of disease observed in families, where
?,S=15 (1~S= sibling risk/population prevalence); IDDMl and IDDM2
have 1s=3 and 1.25 respectively, accounting for 500 of familial
5 clustering (Morton, N., et al. (1983) AM J HUM GENET 35, 201-
213; Todd, J and Farrall, M. (1996) Hum Mol Genet5, 1443-1448;
Bennett, S., et al (1995) Nature Genet 9, 284-292; Risch, N.
(1987) Am J Hum Genet 40, 1-14). A positional cloning
approach was therefore undertaken to identify the other loci:
10a genome wide scan for linkage suggested another 18 possible
regions (Davies, J. et al. (1994) Nature 371, 130-136),
including IDDM~ on chromosome 11q13 (MLS 3.4, p<0.0001 at
FGF3). This locus was subsequently confirmed at levels of
genome-wide significance (p<2 x 10-5) (Todd, J and Farrall, M.
i5 (1996) Hum Mol Genets, 1443-1448; Luo, D-F., et al. (1996) Hum
Mol Genet 5, 693-698).
To investigate the extent of linkage within this region, 704
multiplex families (426 UK, 236 US, 32 Norway, 39 Italy) were
2oanalysed with 19 microsatellite markers in a 25cM interval
spanning FGF3. A multipoint linkage curve was produced
(MAPMAKER/SIBS Kruglyak, L and Lander, S. (1995) Am J Hum
Genet 57 439-454) with a peak MLS=2.8 (p<0.0003) at D11S1889
(Figure 5), indicating that IDDM4 was localised to within the
2518cM interval DlIS903 to DI1S534 (Nakagawa, Y., et al (1997)
Fine mapping of a Type 1 Diabetes Susceptibility Gene (IDDM4)
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
96
on Chromosome llq 13. Hum. Mol. Genet. Submitted).
Multipoint linkage analysis cannot localise the gene to a
small region. Instead, association mapping has been used for
5rare single gene traits which can narrow the interval to less
than 2cM or 2Mb. In theory, associations of a particular
allele very close to the founder mutation will be detected in
populations descended from that founder. The transmission
disequilibrium test (TDT - Spielman, R., et al (1993) Am J Hum
loGenet 52, 506-516) assesses the deviation from 500 of the
transmission of alleles from a marker locus from parents to
affected children. A strategy was undertaken with the
IDDM4 linkage region, using TDT, to detect linkage in the
presence of association, which had also been previously used
i5to fine map the putative IDDM6 locus on chromosome 18q21
(Merriman, T. et al. (1997) Hum. Mol. Genet. 6 1003-1010).
TDT analysis of 658 UK and US families showed a deviation in
transmission of alleles of four loci. Analysis of the three
most common alleles, with p"n~orre~ted<0. 05: D11S4205 54 0
2otransmission, p=0.03; DlIS1783, 58o transmission, p=0.0005;
D11S1189, 46o transmission, p=0.05; H0570POLYA, 540
transmission, p=0.01. The multiallelic TSP test was undertaken
on these loci which is a test for association of loci with
multiple alleles (Martin, E. et al. (1997) Am. J. Hum. Genet.
25 61, 439-448). This confirmed the results with DlIS4205
(Tsp=17.5, p=0.01), D11S1783 (Tsp=23.6, p=0.0001) and
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
97
H0570POLYA (Tsp=12.4, p=0.03). D11S4205 (proximal) and
DI1S1783 (distal) are approximately 1Mb apart, and so may be
showing association with one locus. H0570POLYA is
approximately 3Mb distal to DIlSl783 and therefore may be
5showing association with a second locus. Figure 6 shows the
LOD score of the Tsp analysis (-llog of the p value). Further
analysis of H0570POLYA in 2042 families with type 1 diabetes
confirmed the association observed with this marker and type 1.
diabetes (2X2 test of heterogeneity for affected versus
to unaffected siblings, p~orre~ted<4.8 x 10-5) (Nakagawa, Y., et al
(1997) Hum. Mol. Genet. Submitted).
As association of a particular allele of a marker to the
disease is likely to occur when marker and disease mutation
l5are close (within 2Mb), genes within this interval are
candidates. The SAPL gene is within 200kb of H0570POLYA, in a
region showing strong association with IDDM, hence single
nucleotide polymorphisms within this gene and its regulatory
regions are candidates for the aetiological mutation IDDM4.
TABLE 1
OLIGONUCLEOTIDE PRIMERS
4dest4 if (-26)
25CCGCCTGAGCGCAACTAG
4dest4 2f (0)
TCGTGGGCACCTCCAGATAAG
304dest4 Sf (24)
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
98
ACAAGCTCAGAGAGATGTGGTG
4dest4 3f (66)
AACTTCCTCGGCCATATGG
4dest4 4F (120)
GGGAGAGCTTGTTTCATATCC
4dest4 3r (216)
lOTCTTCTTTGTGGCTCCTTGC
4dest4 1r (443)
CGGTTCTGAGCTTTACATTCC
4dest4 6f ( 565 )
GGGAGAAGATGAATCCTTGC
4dest4 7f (619)
CCCTTTGAATCCACTACTTGC
4dest4 2r (957)
ATTTGTTGCTCAGGCTCCTG
4dest4 8f (1065)
25CAGCCATAGTCAGTGCAATCC
4dest4 4r (1067)
TGGATTGCACTGACTATGGC
4dest4 llf (1497)
TGGGACACCTAACGAGGATAGC
4dest4 9F (1582)
AGATCCTCCGACGAAGTCAG
4dest4 5r (1602)
CTGACTTCGTCGGAGGATCT
4dest4 6r (1765)
40GCCAATGTCATCTTGATCTGC
4dest4 lOf (2012) PAIR WITH 8R
CAAGACTTGTTTGAACCCAGC
4dest4 7r (2189)
TCTCTTTAGTTGGCATCGGC
4dest4 8r (2391)
CTTTCTGCATCCTCCTCTCC
4dest4 12f (2515)
AGATGCTGCTTGTAAAGACGC
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
99
4dest4 12r (2643)
ACTGAAGTGTCACCTGGTGC
4dest4 14f (2909)
SGCCTGTGAAATAAGATCTTGCC
4dest4 14r (2930)
GGCAAGATCTTATTTCACAGGC
l0 4dest4 15r (3376)
CAAGCAAACAAGACTTGAACAG
4dest4A 13R (3876)
TGAGCTGTTTGAGAAGGCTG
4dest4A 11R (4193)
AGTGCTGGAATCTCCACACC
4dest4A 13F (4301)
20TGAAGAGACTGTCCTTGGGC
4dest4A 10R (4691)
CCCATTGTCATATCCTTTCCC
254dest4A 9R (4786)
TTCAGTATGGCCAACACACAG
Vector Primers for RCCA
3o PBS.543R
GGGGATGTGCTGCAAGGCGA
PBS.578R
CCAGGGTTTTCCCAGTCACGAC
PBS.838F
TTGTGTGGAATTGTGAGCGGATAAC
PBS.873F
40CCCAGGCTTTACACTTTATGCTTCC
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
100
TABLE 2
DM4E4 POLYMORPHISMs
I ocationPolvmor~hism 5' Context 3'Context.
3297 delete AAGTA AGATTAAGTA TTTATTGCTA
3488 G to A transition TI'1'I'I'GTTTC TITI'GGTAGTT
3680 G to A transition TATTTTAAAA TAGAAATCAA
4 I delete TTA GTCTAATGCC TTATTTC'I GA
43
Nucleotide location numbers are based on DM4E4a sequence (Figure XX).
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
101
TABLE 3
DM4E4 Intron/Exon Boundaries
Fxon Size 5' 3' IntronSize
1 60+ TCCAGgtaa 1 unknown
2 106 tacagATAAG T'PGAGgtacc 2 > 13,000
3 150 tacagGAGCT GAAAGgtaag 3 18,014
4 233 tttagACCAG TACAAgtaag 4 6,964
187 tctagGTATC AACAGgtaaa 5 3,043
129 tctagATTGT AAGATgtgct 6 unknown
Exons 1-6 account for the first 820 nucleotides of DM4F~la cDNA
CA 02387041 2002-04-10
WO 01/29213 PCT/GB00/04027
102
TABLE 4
Prosite Motifs in SAPR
RPCICj~P Number Motif
279->282 CAMP_PHOSPHO SITE
458->461 CAMP_PHOSPHO SITE
556->559 CAMP_PHOSPHO_SITE
23->25 PKC_PHOSPHO SITE
133-> 135 PKC_PHOSPHO _SITE
278->280 PKC_PHOSPHO_ SITE
421->423 PKC_PHOSPHO_ STT'E
456->458 PKC_PHOSPHO_ SITE
554->556 PKC_PHOSPHO SITE
651->653 PKC_PHOSPHO STl'E
655->657 PKC_PHOSPHO SITE
706->708 PKC_PHOSPHO _SITE
11-> 14 CK2_PHOSPHO_ SITE
IS->I8 CK2 PHOSPHO_ SITE
23->26 CIC2_PHOSPHO_ SITE
171->174 CK2 PHOSPHO_ SITE
202->205 CK2_PHOSPHO_ SITE
214->217 CIC2 PHOSPHO_ SITE
233->236 CK2_PHOSPHO SITE
274->277 CK2_PHOSPHO_ SITE
304->307 CK2_PHOSPHO SITE
315->318 CK2_PHOSPHO SITE
339->342 CK2_PHOSPHO SITE
351->354 CK2 PHOSPHO_ SITE
360->363 CK2 PHOSPHO_ SITE
362->365 CIC2_PHOSPHO_ SIT'E
366->369 CK2 PHOSPHO_ SITE
452->455 CIC.2_PHOSPI-i0 SITE
537->540 CK2_PHOSPHO SITE
563->566 CK2_PHOSPHO SITE
569->572 CK2 PHOSPHO _SITE
571->574 CK2 PHOSPHO STTE
628->631 CK2_PHOSPHO_SITE
642->645 CK2 PHOSPHO STTE
651->654 CK2 PHOSPHO _SITE
660->b63 CK2 PHOSPHO SITE
666->b69 CK2_PHOSPHO SITE
697->700 CK2 PHOSPHO_SITE
744->747 CK2_PHOSPHO SITE
772->775 CK2_PHOS PHO_STTE
293->298 M~STn-
561->566 M~STn-
717->722 MZ'~STZ'I-