Language selection

Search

Patent 2321963 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 2321963
(54) English Title: UNIQUE IDENTIFIER FOR BIOLOGICAL SAMPLES
(54) French Title: IDENTIFICATEUR UNIQUE POUR ECHANTILLONS BIOLOGIQUES
Status: Dead
Bibliographic Data
(51) International Patent Classification (IPC):
  • C12Q 1/68 (2006.01)
(72) Inventors :
  • BING, DAVID H. (United States of America)
  • WILLIAMSON, JANICE M. (United States of America)
(73) Owners :
  • GENOMICS COLLABORATIVE, INC. (United States of America)
(71) Applicants :
  • GENOMICS COLLABORATIVE, INC. (United States of America)
(74) Agent: NORTON ROSE FULBRIGHT CANADA LLP/S.E.N.C.R.L., S.R.L.
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date: 1999-02-25
(87) Open to Public Inspection: 1999-09-02
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/US1999/004094
(87) International Publication Number: WO1999/043855
(85) National Entry: 2000-08-23

(30) Application Priority Data:
Application No. Country/Territory Date
60/076,081 United States of America 1998-02-26

Abstracts

English Abstract




The present invention provides a method for internal labelling of a biological
sample by which the sample is identifiably linked to its source and other
relevant information, based on the polymorphisms inherent in the sample
itself. A set of polymorphisms in the sample is detected, and the resulting
data is used as a unique identifier which is then used to identify the sample.
This unique identifier can also be used to identify the source of the sample,
and any other relevant information.


French Abstract

L'invention concerne un procédé permettant de marquer un échantillon biologique, de manière interne, l'échantillon étant lié de manière identifiable à sa source et à d'autres informations significatives basées sur les polymorphismes inhérents à l'échantillon lui-même. On détecte un ensemble de polymorphismes dans l'échantillon, et on utilise les données résultantes comme identificateur unique qui sert ensuite à identifier l'échantillon. On peut également utiliser l'identificateur unique pour identifier la source de l'échantillon et toute autre information significative.

Claims

Note: Claims are shown in the official language in which they were submitted.



-21-

CLAIMS

What is claimed is:

1. A method for producing a unique identifier for a biological sample, the
method comprising:
(a) detecting one or more polymorphisms within the biological sample;
and
(b) selecting one or more polymorphisms sufficient to form a unique
identifier;
thereby producing a unique identifier for a biological sample.
2. The method of Claim 1, wherein the biological sample is taken from an
organism selected from the group consisting of: vertebrates, invertebrates,
plants, and microorganisms.
3. The method of Claim 1, wherein the biological sample is from a mammal.
4. The method of Claim 3, wherein the mammal is a human.
5. The method of Claim 3, wherein the biological sample is selected from a
group consisting of blood, saliva, hair, body fluid, tissues, organs, and one
or
more cells.
6. The method of Claim 5, wherein the polymorphism is selected from the
group consisting of nucleic acid polymorphisms, protein polymorphisms,
enzyme polymorphisms, chemical polymorphisms, biochemical
polymorphisms, phenotypic polymorphisms, and quantitative
polymorphisms.


-22-

7. The method of Claim 6, wherein the polymorphism is a nucleic acid
sequence polymorphism.
8. The method of Claim 7, wherein the polymorphism is a nucleic acid length
polymorphism.
9. The method of Claim 8, wherein the polymorphism is a short tandem repeat
(STR).
10. The method of Claim 1, wherein the unique identifier is also linked to the
source of the biological sample.
11. The method of Claim 1, wherein the unique identifier is also linked to
relevant information about the biological sample or the source of the
biological sample.
12. The method of Claim 1, wherein the unique identifier is selected from the
group consisting of an alphanumeric string, and a bar code.
13. A method for establishing a repository containing a collection of
biological
samples, wherein each biological sample has a unique identifier associated
with it, the method comprising:
(a) obtaining a biological sample from a source;
(b) detecting one or more polymorphisms in the sample;
(c) selecting one or more polymorphisms sufficient to form a unique
identifier;
(d) using the unique identifier to identify the sample;
(e) storing the sample with the unique identifier;
(f) repeating steps (a) through (e) for biological samples from other
sources;


-23-

thereby establishing a repository containing a collection of biological
samples, wherein each such biological sample has a unique identifier
associated with it.
14. The method of Claim 13, wherein the samples are DNA-containing samples.
15. The method of Claim 13, wherein the sample source is a human.
16. The method of Claim 13, wherein the polymorphism is selected from the
group consisting of nucleic acid polymorphisms, protein polymorphisms,
enzyme polymorphisms, chemical polymorphisms, biochemical
polymorphisms, phenotypic polymorphisms, and quantitative
polymorphisms.
17. The method of Claim 16, wherein the polymorphism is a short tandem repeat
(STR).
18. The method of Claim 13, wherein the unique identifier is selected from the
group consisting of an alphanumeric string, and a bar code.
19. The method of Claim 13, wherein the unique identifier is also linked to
the
source of the biological sample.
20. The method of Claim 13, wherein the unique identifier is also linked to
relevant information about the biological sample or the source of the
biological sample.
21. A method of determining, by means of a unique identifier, if a source is
represented by a sample within the repository of Claim 15, the method
comprising:
(a) obtaining a sample from the source;



-24-

(b) detecting one or more polymorphisms in the sample;
(c) selecting one or more polymorphisms sufficient to form a unique
identifier, wherein the polymorphisms selected are those used to form
unique identifiers for the samples within the repository;
(d) comparing the unique identifier of (c) to the unique identifier of each
sample in the repository;
wherein shared identity between the unique identifier of (c) to a unique
identifier of a sample in the repository indicates that the source is
represented
by a sample within the repository.
22. The method of Claim 21, wherein the samples are DNA-containing samples.
23. The method of Claim 21, wherein the source of the sample is a human.
24. The method of Claim 21, wherein the polymorphism is selected from the
group consisting of nucleic acid polymorphisms, protein polymorphisms,
enzyme polymorphisms, chemical polymorphisms, biochemical
polymorphisms, phenotypic polymorphisms, and quantitative
polymorphisms.
25. The method of Claim 24, wherein the polymorphism is a short tandem repeat
(STR).
26. The method of Claim 21, wherein the unique identifier is also linked to
the
source of the biological sample.
27. The method of Claim 21, wherein the unique identifier is also linked to
relevant information about the biological sample or the source of the
biological sample.


-25-

28. The method of Claim 21, wherein the unique identifier is selected from the
group consisting of an alphanumeric string, and a bar code.
29. A method for linking, by means of a unique identifier, a member of a first
group with a member of a second group, wherein the first group comprises a
biological sample lacking a unique identifier, and a source of a biological
sample lacking a unique identifier, and wherein the second group comprises
a biological sample having a unique identifier, a source of a biological
sample having a unique identifier, or information having a unique identifier,
the method comprising:
(a) detecting one or more polymorphisms in the member of the first
group;
(b) selecting one or more polymorphisms sufficient to form a unique
identifier, wherein the polymorphisms selected are those used to form
the unique identifier for the members of the second group;
(c) comparing the unique identifier of the member of the first group to
the unique identifier of the member of the second group;
wherein shared identity between the unique identifier of the member of the
first group and the unique identifier of member of the second group, links the
member of the first group with the member of the second group.

Description

Note: Descriptions are shown in the official language in which they were submitted.



CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
-I-
UNIQUE IDENTIFIER FOR BIOLOGICAL SAMPLES
RELATED APPLICATIONS)
This application claims priority to application 60/076,081, filed February 26,
1998, the entire teachings of which are incorporated herein by reference.
BACKGROUND OF THE IhTVENTION
With the advent of the Humaa Gesiome Project and the advances in
technology that have resultod, biological and genetic testing have become
increasingly more common. Hospitals aad other health care entities are using
new
tests for diseases, and are processing more samples for testing than ever
before. The
~ 0 ease and speed of many biological tests has also increased enoanously, so
that these
tests are now being widely used outside of the health care industry.
Veterinarians,
of course, have always closely followed advaaces in human health care. But law
enforcement agencies now routinely employ DNA based methods in forensics, and
even population geneticists, ecologists, and evolutionary biologists use these
methods to track the evolution and variability within and between populations
of
organisms.
When handling large numbers of samples, accwate and reliable tracking of
samples aad quality control of associated information is vital. In hospital
settings,
aberrant test results arc always a cause for concern because doubts are then
cast on
the state of the patient's health. In a tissue repository, it must be possible
for a
sample (or portions of a sample} to be reliably and repeatedly ntnieved with
no
doubts as to the sample's identity. Mislabeling or loss of labeling of a
sample may
mean that the sample is rendered useless if it cannot be accurately connected
back to


CA 02321963 2000-08-23
WO 99/43855 PC'TNS99/04094
-2-
the sample's history and/or source. Most samples and their sources are given a
common alphanumeric designation, and this designation is also linked to
information about the source and the sample (e.g., patient name, sample type,
disease condition, etc.). A loss of this designation from its association with
either
the source, or the sample, or the information will often result in a complete
loss' of
utility of all three.
This potential loss of association between the designation and the sample is
especially likely in settings where very large numbers of samples are being
processed. Machine errors, while problematic, generally result in the
destruction of
large numbers of samples, and so are noticed easily. Human error, however, has
the
potential to cause serious errors that go unnoticed for a period of time.
These
include transcription errors, misplacing or swapping of samples, destruction
of
labels, off by-one errors (resulting in a series of samples where the
designation or
information from each sample is nusassociated with the next sample). In
addition,
pages finm lab notebooks can be obliterated or lost, and magnetic media
corrupted.
Databases containing all of this information can be backed up, but intervening
data
added to the database since the last backup is usually lost. If an error is
introduced
and not discovered until after a backup is made, then this error effectively
replaces
the "true" data. In addition, many facilities save only the most recent
backup, or
store backups at the same site as the current data, resulting in loss of all
information
in the event of a physical disaster (e.g. a fire):
SUMfMARY OF THE INVENTION
The present invention relates to a method of creating a unique identifier for
reliably identifying samples, their sources, and associated information. The
use of
the identification system described herein substantially decreases potential
mixups
and misidentification of samples, their sources, and associated information.
Specifically, the present invention provides a method for creating a unique
identifier which is used to label the sample, its source, or the associated
information,
based on the polymorphisms inherent in the sample and its source. One or more
polymorphisms in the sample is detected, and the resulting polymorphism data
is


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
-3-
used to produce a unique identifier, which is then used to identify the
sample. This
unique-identifiercawalso be linkedwith'the source; and/or any information that
may
be associated with either the sample or the source (i.e. the unique identifier
can be
used as a common designation for the sample, its source and/or other relevant
information). If this unique identifier is separated from the sample, then the
polymorphisms within the sample simply need to be re-detected to reproduce the
polymorphism data which is then used to produce the unique identifier, thereby
recreating the proper unique identifier, and, ultimately, its link to its
source.
In general, the invention features a method for producing a unique identifier
for a biological sample, comprising detecting one or more polymorphisms within
the
biological sample, and selecting one or more polymorphisms sufficient to form
a
unique identifier. The biological sample can be from a vertebrate, an
invertebrate, a
plant, or consist of microorganisms. 'The biological sample can also be from a
mammal, particularly a human. The sample can be blood, saliva, hair, body
fluid,
tissues, organs, one or more cells, or a whole organism. The polymorphisms can
be
nucleic acid polymorphisms, protein polymorphisms, enzyme polymorphisms,
chemical polymorphisms, biochemical polymorphisms, phenotypic polymorphisms,
and quantitative polymorphisms, particularly a nucleic acid sequence
polymorphism,
a nucleic acid length polymorphism, or a short tandem repeat (STR). The unique
20 identifier can also be linked to the source of the biological sample, or
relevant
information about the biological sample or the source of the biological
sample. . The
unique identifier can be in the form of an alphanumeric string, or a bar code.
The invention also features a method for establishing a repository containing
a collection of biological samples, comprising obtaining a biological sample
from a
25 source, detecting one or more polymorphisms in the sample, selecting one or
more
polymorphisms sufficient to form a unique identifier, using the unique
identifier to
identify the sample, storing the sample with the unique identifier, and
repeating
these steps for biological samples from other sources. The samples, in
general, are
DNA-containing samples, particularly from humans, and the polymoiphisms are
30 nucleic acid polymorphisms, protein polymorphisms, enzyme polymorphisms,
chemical polymorphisms, biochemical polymorphisms, phenotypic polymorphisms,


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
-4-
and quantitative polymorphisms, or short tandem repeat (STR). The unique
identifier can be in the form of an alphanumeric string, or a bar code, and
can also be
linked to the source of the biological sample, or relevant information about
the
biological sample or the source of the biological sample.
In addition, the invention features a method of determining, by means of a
unique identifier, if a source is represented by a sample within the
repository,
comprising obtaining a sample from the source, detecting one or more
polymorplusms in the sample selecting one or more poiymorphisms sufficient to
form a unique identifier, and comparing the unique identifier so produced to
the
unique identifier of each sample in the repository, where shared identity
between the
two unique identifiers indicates that the source is already represented in the
repository. In general, the samples are DNA-containing samples, preferably
from
humans. The polymorphislris are nucleic acid polymorphisms, particularly short
tandem repeats (STR), protein polymorplusms, enzyme polymorplusms, chemical
polymorplusms, biochemical polymorphisms, phenotypic polymorplusms, and
quantitative polymorphisms. The unique identifier can also be linked to the
source
of the biological sample, or relevant information about the biological sample
or the
source of the biological sample. The unique identifier can be in the form of
an
alphanumeric string, or a bar code.
The invention also features a method for linking, by means of a unique
identifier, a first biological object lacking a unique identifier with a
second object
having a unique identifier, comprising detecting one or more polymorpl>isms in
the
first biological object, selecting one or more polymorphisms sufficient to
form a
unique identifier, and comparing the unique identifier so made to the unique
identifier of the second object, where shared identity between the two unique
identifiers links the first biological object with the second object. The
biological
sample can be from a vertebrate, an invertebrate, a plant, or consist of
microorganisms. The biological sample can also be from a mammal, particularly
a
human. The sample can be blood, saliva, hair, body fluid, tissues, organs, one
or
30 more cells, or a whole organism. The polymorplvsms can be nucleic acid
polymorphisms, particularly short tandem repeats (STR), protein polymorphisms,


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
-5-
enzyme polymorphisms, chemical polymorphisms, biochemical polymorphisms,
phenotypic polymorphisms, and quantitative polymorphisms, particularly a
nucleic
acid sequence polymorphism, or a nucleic acid length polymorphism. The unique
identifier can also be linked to the source of the biological sample, or
relevant .
5 information about the biological sample or the source of the biological
sample. The
unique identifier can be in the form of an alphanumeric string, or a bar code.
DETAILED DESCRIPTION OF THE INVENTION
The present invention provides a method for creating a unique identifier for
identifying biological samples, their sources, or associated information based
on the
10 polymorphisms inherent in the sample and source themselves. In this method,
the
nucleic acid contained within the sample itself is used to pmduce the unique
identifier for identifying and linking the sample, its source, and associated
information. No external material needs to be added to the sample which could
dilute or alter the accuracy of other test results. An advantage of the
invention,
15 therefore, is that it is unnecessary to add "identifying sequences" to the
samples, and
that without such additions, one may conduct studies of genetics, disease
associations, evolutionary relationships, etc., without the results being
tainted by the
added identifying sequences.
A "source" or "the source from which the sample is derived" refers to the
20 originating material for a sample. A source of a biological sample, for
example, can
be a human, any animal, plant, insect, or a population or strain of
microorganisms.
A source of a biological sample does not have to be living, and can be a
deposit in a
tissue repository, herbaria or museum specimens, forensic specimens, or
fossils. A
"potential source" as used herein, means a source from which the sample may
25 possibly have been taken in the past.
By "sample" is meant a portion of source biological material that originated
elsewhere, i.e., the sample was removed from its source. A sample can be any
biological sample, (e.g., blood, saliva, hair, organs, biopsies, bodily
fluids, one or
more cells), and can be taken from any vertebrate, including mammals such as
30 humans, or plant, insect, reptile. The sample can also be a strain or mixed


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
-6-
population of microbes. Samples can also be biological materials taken from
defunct orextinct organisms; e:g:; samples can be taken from pressed plants in
herbarium collections, or from pelts, taxidermy displays, fossils, or other
materials
in museum collections.
"Information associated with" the sample or source from which the sample is
derived is meant to include, without limitation, any information that might be
necessary or advantageous to be linked to the sample or the sample source,
e.g.
name, address, sex, medical history (in the case of human samples), species,
collection data, provenance (in case of non-human samples), etc.
Once a biological sample is taken from its source, the sample is tested across
one or more polymorphic loci, and the polymorphic data produced are used to
create
a unique identifier, which is identifiably linked to the sample, and serves as
its
unique designation. This unique identifier can also be identifiably linked to
the
sample's source, and/or any information that may exist concerning the source
and/or
the sample. By saying that the unique identifier is "identifiably linked" to
the
sample, sample source, or related information means that it is connected in
some
way with any or all of these three things, e.g. the unique identifier may be
on a label
attached to a container holding the sample, the unique identifier may exist as
a field
in a database record containing medical data regarding the source, etc. In
essence,
20 the genetic code of the sample itself, which is unique and forms the basis
of the
polymorphisms tested, serves as the unique identifier. Because the unique
identifier
is based on the genetic code, which is unique between individuals, the unique
identifier will also be unique between samples from different source
individuals.
A "polymorphism" is an allelic variation between two samples. As used
25 herein, the term includes differences between proteins (e.g., enzymes,
blood groups,
blood proteins), differences in the chemicals and biochemicals (e.g.,
secondary
metabolites) produced by the source organism(s), differences between nucleic
acids
involving differences in the nucleotide sequence (e.g., restriction site
maps), or
differences in length of a stretch of nucleic acid (e.g., RFLPs (restriction
fragment
30 length polymorphisms), microsatellites, STRs (short tandem repeats), SSRs
(simple
sequence repeats), SSLPs (simple sequence length polymorphisms), VNTRs


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
_'j_
(variable number tandem repeats)). Allelic variation can also result in
phenotypic
(i.e., visually- apparent] polymorphisms, or variations in quantitative
characters (e.g.
variation in height, length, yield of fruit, etc.) between the organisms that
serve as
the source of the samples. With some types of biological material, phenotypic
5 differences may be visible in the samples themselves, e.g., kernels of
different types
of "Indian" corn often appear very different from each other, with red,
yellow, white,
blue, streaked kernels, etc. With such samples, phenotypic polymorphisms could
also be used to produce the unique identifier.
A polymorphism is not limited by the function or effect it may have on the
organism as a whole, and can therefore include allelic differences which may
also be
a mutation, insertion, deletion, point mutation, or structural difference, as
well as a
strand break or chemical modification that results in an allelic variant. A
polymorphism between two nucleic acids can occur naturally, or be caused
intentionally by treatment (e.g., with chemicals or enzymes), or can be caused
by
15 circumstances normally associated with damage to nucleic acids (e.g.,
exposure to
ultraviolet radiation, mutagens or carcinogens).
As used herein, a "sequence polymorphism" is a difference in the sequence
of two nucleic acids or two amino acids. Two amino acid sequences can differ
by
having different residues at a particular position (i.e., and amino acid
substitution),
or some residues may be deleted, or new residues inserted or added to one or
more
ends: Two nucleic acids differing in sequence may have the same number of base
pairs (e.g., "AT~C" vs. "AT~C'~, but may also include some differences in
overall
sequence length as well (e.g., "AT~:ACATG" vs. "ATCACACATG"). Types of
commonly-studied polymorphisms caused by sequence differences include
25 restriction site polymorphisms, isozymes, differences in protein
conformation, and
length polymorphisms. If the nucleic acid is sequenced, then a sequence
difference
itself (as represented by the string of letters) serves as the polymorphism.
As used herein, a "length polymorphism" is a difference in the length of two
nucleic acids. Two different nucleic acids with a length polymorphism between
30 them also have a sequence polymorphism, but many methods used to detect a
length
polymorphism do not reveal the exact sequence polymorphism. Commonly-used


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
_g_
types of length polymorphisms include RFLPs (restriction fragment length
polymorphisms), microsatellites, 3TRs (short tandem repeats), SSRs (simple
sequence repeats), SSLPs (simple sequence length polymorphisms), and VIVTRs
(variable number tandem repeats).
In general, the difference between "length polymorphisms" and "sequence
polymorphisms" is generally in the methods used to detect them. With RFLPs,
for
example, restriction endonucleases are used to cut a nucleic acid molecule
into
fragments, which are then separated on an agarose gel. The differences between
two
individuals are measured by the changes in size of the resultant nucleic acid
fragments, and so are referred to as length polymorphisms, yet those
differences are
caused by differences in the underlying sequence, which is the basis for the
change
in restriction sites, and therefore the changes in the sizes of the nucleic
acid
fragments. Because the method of detecdon/visualization can only differentiate
on
the basis of fragment length, the RFLPs are generally classed as length
polymorphisms..
As used herein, a "polymorphic locus" is a segment of nucleic acid which
may contain a polymorphism as described above. It is not required that the
precise
sequence of the nucleic acid be known. A polymorphic locus is not limited to
those
loci which are polymorphic in all situations, e.g., a polymorphic locus which
20 displays an allelic variation between individuals A and B, but not between
individuals A and C, remains a polymorphic locus for purposes of comparing
individuals A and B, as well as individuals B and C.
'Nucleic acid" means deoxyribonucleic acid (DNA), ribonucleic acid
(RNA), nucleic acids from mammals or other animals, plants, insects, bacteria,
viruses, or other organisms.
By "unique identifier" is meant an identification tag, designation, or code to
be linked to a sample, its source, or other information, such as patient case
history,
disease testing results, genetic testing results, geographic or temporal
collection data,
or any other information which may be useful when linked with the sample or
source. The unique identifier can exist in the form of an alphanumeric string,
a bar


CA 02321963 2000-08-23
WO 99/43855 PCTNS99/04094
-9-
code, an entry in a database, or any other useful human-readable or machine-
readable form.
The extent to which such an identifier will be unique depends on the loci
chosen for polymorphism testing. Allelic polymorphism has been studied for
decades, and there are many genetic systems which have been commonly used in
assessing polymorphism between populations or individuals. These include
classical blood groups, blood proteins, isozymes, distribution of restriction
endonuclease sites, restriction fragment length polymorphisms (RFLPs), and
others.
The most successful to date, however, have been microsatellites, also known as
short
10 tandem repeats (STRs), or simple sequence length polymorphisms (SSLPs), or
variable number tandem repeats ('VNTRs).
STRs are stretches of DNA that consist of repeated sequences repeats. The
base sequence is usually just a few base pairs long, typically two to twelve
base
pairs, but longer base repeats have been seen. This base sequence is then
tandemly
15 repeated, and the number of times it is repeated can vary greatly,
depending on the
STR locus being studied. An STR can therefore be expressed as (X)n, where X is
the
repeated sequence, (e.g. "CA") and n is the number of times that X is
repeated.
Most individuals in a population will have the same STR at the same
location in the genome, that is different individuals will have the same base
repeat
20 at the same location, but the precise number of repeats often varies from
individual
to individual. For example, for a given STR mapped to a particular location on
the
genome, the base sequence may be repeated 5 times in individual A, but may be
repeated 8 times in individual B and 20 times in individual C.
These tandem repeats are believed to be caused by "slippage" of the DNA
25 polymerase enzyme as the DNA is replicated. In general, n increases over
generations, and the amount of slippage varies over time and in different
lines. The
variability of these repeated sequences is generally correlated to the length
of the
base repeat, with STRs composed of longer base repeats exhibiting less
variability
between individuals than shorter base repeats. For example, a two base pair
repeat
30 may consist of a two base pair unit being repeated hundreds of time in an
individual,
while a 12-base pair unit may only be repeated a few times. In general, the
amount


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
-10-
of slippage that occurs during replication, and therefore the amount of
variability in
the number of repeats that results from that slippage, is also correlated to
the length
of the base repeat. Short repeats tend to exhibit higher rates of polymorphism
between individuals, while 10- or 12-base pair repeats may show little or no
variability.
STRs can be amplified and detected by known procedures. For example,
they can be detected by electrophoretic separation followed by radionuclide or
fluorescent labeling, or silver staining. They have many advantages over other
methods of detecting polymorphisms (e.g., RFLPs) because of their small size,
the
10 ease and speed with which they can be detected and analyzed, and the fact
that the
process is amenable to automation. The more recent generations of large-scale
genetic maps have been made using STRs (Hudson, T.J. et al., Science 270:1945-
1954 (1995); Dietrich, W.F., et al., Nature Genetics 7:220-245 (1994); Yerle,
M., et
al., Mamm. Genome 6:176-186 (1995); Jacob H.J., et al., Nature Genetics 9:63-
69
(1995)). Because of the extremely high rate of polymorphism of some of the STR
loci, they are also used in forensic tests by law enforcement agencies.
A number of kits for amplifying STR loci are commercially available, and
the rates of polymorphism of these loci in different ethnic backgrounds are
known.
These include AmpFISTRTM Profiler, AmpFISTRTM Profiler Plus, AmpFISTRTM
20 Green I (PE Applied Biosystems, Foster City, California, USA), the
GeneprintTM
STR Systems (Promega Corp., Madison, WI), including GeneprintTM PowerPlexTM
1.1, GeneprintTM PowerPlexTM 1.2, GeneprintTM PowerPlex'i'M 2, and GeneprintTM
PowerPlexTM 16, Sex Determination Systems, and others. These STR systems were
developed for use in humans, but microsatellite markers have been developed in
other organisms, including horse, cattle, sheep, goat, dog, pig, mouse, rat,
barley,
corn, soybean, and others.
These loci can be used singly, or can be combined, depending on the power
of discrimination required. As the number of organisms being studied and the
number of individuals fibm which samples are removed and archived increases,
the
degree of polymorphism required to uniquely identify each sample also
increases,


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
-11-
and the number of polymorphic loci that need to be tested to have a sufficient
number to create the unique identifier also increases.
For example, if three individuals possessed the following alleles at three
different loci:
Locus 1 Locus 2 Locus 3


Individual 1,3 1,2 3
A


Individual 2,3 2,2 1,4
B


Individual 2,3 1,5 3,5
C


then detection of the alleles at Locus 1 would allow a sample from Individual
A to
10 be distinguished from a sample from B or C, but samples from B and C could
not be
uniquely distinguished from each other, and a second locus would need to be
tested.
On the other hand, Locus 2 alone could serve as the unique identifier, because
by
itself, it can serve to distinguish between samples from all three
individuals.
The Power of Discrimination (PD) of a given system of loci is defined as the
15 probability that two individuals selected at random will differ with
respect to that
given system of loci. The PD is related to the Probability of Identity (Pj) by
the
equation
PD -1 _ Pi,
Where PI is determined by solving the equation
2U Pl ° ~X;Z~
where X; is the frequency in the population of the ith allele. The allelic
frequencies
within different ethnic populations are known for many of the polymorphisms of
the
STR loci in the commercially-available kits, so a set of STRs which will
provide a
unique identifier for every sample can be chosen, even if the final number of
25 samples is not known. Combinations of loci can be chosen that have matching
probabilities of less than 1 in several million or more (See, for example
Table 1).


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
-12-
Table 1. Matching probabilities of various populations in the GeneprintTM STR
system, using fluorescent detection (Promega Corp., Madison, Wisconsin, USA).
Caucasian-AmericanAfrican-AmericanHispanic-American


CTTv Quadriplex1/6623 1/25575 1/7194


FFFL Quadriplex1/2632 1/16807 1/3279


Both combined1/17,400,000 1/430,000,000 1/23,600,000


Once the polymorphism rates are known for a series of loci, one can choose
which loci and how many will go into making up the unique identifier. If it is
anticipated that the final number of samples will be relatively small, than
only a few
loci are sufficient to form the unique identifier, and one or more of the loci
may not
need to have a high Pp. On the other hand, if one intends to store a very
large
number of samples, then it would be prudent to use more loci, each with a high
PD.
The loci seleected will be based on considerations such as PD, anticipated
size of the
repository, ease of use, applicability to the organisms being sampled, cost,
and
availability.
polvm~'~lisms used
The polymorphisms that can be used in the invention will vary depending on
the types of samples being stored. STRs are well-studied in humans, and kits
are
commercially available for amplifying a number of STR loci. Genetic maps based
on STRs have been built for other organisms (e.g., mouse, rat, pig). STRs
appear to
exist in most higher organisms, and are easy to isolate and characterize.
Because the
methods used to identify and assess STRs are virtually identical for different
organisms, one skilled in the art can isolate STRs in an organism of choice,
assess
the polymorphism rates, and choose those most useful in the present invention.
For
many organisms, STRs and their primer sequences have been published in the
scientific literature. One wishing to use previously published STRs need only
order
those primers (e.g., custom primers can be ordered and received in 48 hours
from
Research Genetics, Huntsville, Alabama, USA), and then use them to amplify the
STRs in the DNA of the collected samples.


CA 02321963 2000-08-23
WO 99/43855 PC1'/US99/04094
-13-
It is not necessary to use commercially-available primers to practice the
present invention, nor is it necessary to use microsatellite markers developed
by
others. The present invention allows one to use any polymorphic marker that is
convenient, so long as it provides a power of discrimination between
individuals.
There are many species worthy of study for which no genetic map exists. One of
the
reasons that mierosatellite markers have become so successful is that they are
easy
to develop for previously unstudied organisms. One already familiar with an
organism for which there are no microsatellite markers can develop them with
relative ease using methods well-known in the art.
Uses of the invention
The method described herein will be of particular use in a pathology
laboratory or testing facility, or a large-scale cryogenic repository.
Maintaining the
integrity of the sample labels is of paramount importance in these situations,
as
quality control problems often result from failure of the record-keeping
system.
Naturally, such a method will also be of use to blood banks, tissue banks, and
veterinary hospitals and testing facilities.
The method can also be used by large repositories to identify misplaced or
misidentified samples. For example, a tissue bank may take a small piece
(e.g., a
sample) of a stored tissue (e.g., a source) for testing (e.g., tissue typing
for a
20 potential recipient of the tissue). If the identification were
disassociated from the
sample (e.g., the label fell off the test tube), those test results would
normally be
lost. Using the unique identifier of the present invention, however, one would
simply test the sample for the polymorphic loci, and recreate the unique
identifier.
The sample (and the tissue typing test results) could then be reassociated
with the
source in storage.
The method is especially useful to maintain the long-term integrity of
samples and associated information, especially in tissue repositories. Many
biomedical studies require analysis of tissue samples from large populations
of
individuals with known medical, dietary, genetic, social, and cultural
backgrounds.
During the course of a study requiring several years to complete, it may be
necessary


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
-14-
to test a particular sample several times. It is therefore vital to the
accuracy of the
study to confidently re-retrieve that sample.
Many biomedical studies involve analysis of a group of individuals with a set
of characteristics in common (e.g., cigarette smoking, ethnic background,
incidence
of particular cancers). At present, the amount of time and egort involved in
assembling a set of individuals appropriate for a study may be greater than
the effort
of conducting the study itself. If samples from a large number of individuals
could
be collected in a repository along with associated data on the individuals,
then it
should be possible to "assemble" a set of individuals for a given study by
selecting
samples from these individuals chosen on the basis of a defined set of
characteristics. For example, if a blood sample repository contained samples
from
100,000 individuals, and associated medical data on those same individuals in
computerized form, then a medical study could be conducted by selecting
individuals with desired characteristics (as listed in the computerized
medical data),
and then retrieving samples (or more likely, sub-samples) from those
individuals,
which are held in the repository. The method of the present invention is
useful in
establishing such a repository, because the method greatly reduces the
likelihood of
samples being misidentified and allows confident re-retrieval of samples.
Another advantage of the invention is that if the unique identifier is de-
associated from the sample within the repository (e.g., the label falls off
the tube)
analysis of the polymorphisms in the sample allows re-creation of the unique
identifier.
Use of the method of the invention also provides a method for preventing
repository deposit of samples from duplicate sources, because when the unique
identifier is created for a sample from a new source, one need only search the
repository records for that same unique identifier to see if the source is
already
represented by a sample in the repository.
The invention also has uses outside of the medical field. Because of the
increasing ease with which samples from various sources (e.g., plant, animal,
microbial, fizngal, viral) can be tested for polymorphisms, the invention is
applicable
in any situation where a large number of biological samples may be stored. An


CA 02321963 2000-08-23
WO 99/43855 PGTNS99/04094
-15-
example of such a situation would be a field study of biodiversity of a wild
population. New tests for assessing diversity (i.e., assessing polymorphism)
are
continually being created, and samples from previous collecting expeditions
represent a "snapshot" of the biodiversity that existed in the past. Such
previously-
collected samples can be re-tested using current techniques, but the results
are only
useful if the integrity of the sample designations is still sound, and the
samples can
be linked to their original collection data. Maintenance of quality of the
record
keeping is especially important if the field samples are from species which
are
endangered or extinct. 'The method of the invention provided here has
potential uses
10 in studies of population genetics, evolutionary genetics, and ecology. In
studies of
flora and fauna from locales that are either increasing or decreasing in
pollution, for
example, it is necessary to both store the samples for a period of time and
also
maintain their identification. Such sampling at periodic intervals is also a
requirement of an effective bioremediation plan.
15 There are many situations where one would want to keep a biological sample
for a period of time against the possibility of testing it again later. For
example,
even if one has conducted a population genetics study on a series of samples
(collection of organisms), a new test developed at a future time may allow the
testing of different hypotheses, and provide the answer to new questions,
without
20 necessitating collection of new samples in the field. Therefore, the method
of the
invention described herein would be especially useful in maintaining
collections of
samples from endangered species. The unique identifier and identity of each
sample
can be re-verified from the sample itself.
This sample identification method can be used to keep track of samples in
25 any study or collection where there are a large number of biological
samples being
stored for a period of time, and where there is a chance that samples may
become
misplaced or mislabeled.
EX~MP.L~~
~ple 1: Use of STRs to Create a Unique Identifier


CA 02321963 2000-08-23
WO 99/43855 PCTNS99/04094
-16-
A biological sample is obtained from a human, and an aliquot is taken for
polymorphism testing. DNA is isolated by methods well known in the art
(Mamatis
et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory
Press, New York; Ausubel, F.M. et al., eds., Current Protocols in Molecular
5 Biology). An amount of this isolated DNA is removed, GenePrintTM primers
(Promega Corp., Madison, Wn for the CSF1P0 locus are added to it, and
amplification is carried out, all according to the manufacturer's instructions
(supplemental information on thermocycling are well known in the art, see
e.g.,
Innis, M.S., et al. (1990) PCR Protocols: A Guide to Methods and Applications,
10 Academic Press, Inc. San Diego, CA). After amplification, fluorescence
detection is
carried out, also according to the manufacturer's recommendations. The process
is
repeated for the other loci in the CTTv Quadriplex (TPOX, THO1, vWA), and also
the four loci in the FFFL Quadriplex {F13A01, FESFPS, F13B, and LPL).
Once detection of the polymorphism(s) is complete, the unique identifier can
15 be created for the sample. For a system such as these eight loci, where the
alleles
are 3 to 7 repeats in length, a convenient conversion method is to simply list
each
locus by letter, followed by the two alleles for that locus. For a sample with
alleles
of 3 and 5 tandem repeats at the first locus, alleles of 2 and 7 repeats at
the second,
etc., the unique identifier would be "A35B27...HXY". The precise conversion
20 method could be varied, depending on the number of repeats in the loci,
e.g., a locus
with 3 - 12 repeats would require 4 digits after the Iocus letter.
]~xasnnle 2~ Preparation of DNA from Samples of Whole Blood
Red blood cells lack DNA because they are enucleated, and must therefore
be lysed to facilitate their separation from white blood cells, which contain
genomic
25 DNA. After the red blood cells are lysed and removed, the white blood cells
are
then lysed with an anionic detergent in the presence of a DNA stabilizer,
which
limits the activity of DNase. Contaminating RNA is then degraded with RNase,
and
the RNA, proteins, and other contaminants are then removed by salt
precipitation.
The genomic DNA is recovered by alcohol preciptation, dissolved in TE buffer,
and
30 stored. Because the genomic DNA will be used in a nucleic acid
amplification


CA 02321963 2000-08-23
WO 99!43855 PCT/US99/04094
-17-
method, it is advisable to also have a "blank" control tube (containing
reagents but
no blood) accompany the blood sample tube through the extraction process. Aver
extraction, the "DNA" from the "blank" control tube would be amplified to
ensure
that no extraneous DNA has contaminated the extraction process
$ Isolation of genomic DNA from whole blood can be accomplished by
following any of a variety of protocols, including using the PUREGENE~ kit
(Gentry Systems, Minneapolis, Minnesota, USA), and following the
manufacturer's
instructions. Place 30 ml of RBC Lysis Solution into a SO ml tube, add 7 ml to
10
ml of whole blood, mix by inverting several times, and incubate for 10 minutes
at
room temperature. Invert the tube again at least once during the incubation.
Centrifuge the tube for 10 minutes at 2,000 x g, pour off the supernatant,
leaving
behind the visible white cell pellet and about 200 ~1 of residual liquid.
Vortex the
tube vigorously for 20 seconds to resuspend the cells in the residual liquid.
Add 10
ml of Cell Lysis + RNase A (made fresh that day), and vortex on high speed for
10
seconds. Incubate the tube at 37°C for 15 to 30 minutes to allow
digestion of the
RNA.
Cool the sample to room temperature by placing in an ice bath for 10
minutes. Add 3.33 ml of the Protein Precipitation Solution (Gentry Systems,
Minneapolis, Minnesota, USA) into the tube. Vortex at high speed for 20
seconds to
mix uniformly. Centrifuge at 2,000 x g for 10 minutes. If a tight, dark brown
pellet
is not formed, repeat the 20-second vortex, followed by a 5-minute incubation
on
ice, and repeat the 10-minute centrifugation at 2,000 x g.
Pour off the supernatant into a clean 50 ml tube containing 10 ml of 100%
isopropanol. Mix by inverting gently 50 times (do not vortex, or the DNA will
be
sheared). The DNA is stable at this point, and can be stored indefinitely in
the
isopropanol.
Centrifuge at 2,000 x g for 3 minutes. Carefully pour off the supernatant,
leaving behind the white pellet, and drain the tube upside down on clean
absorbent
paper. Add 10 ml of 70% ethanol, and wash the pellet by inverting gently,
avoiding
dislodging the pellet. Centrifuge at 2,000 x g for 1 minute. Carefully pour
off the


CA 02321963 2000-08-23
WO 99/43855 PCT/US99/04094
-18-
ethanol, leaving the pellet behind. Invert carefillly so as to not dislodge
the pellet,
and drain the tube on clean absorbent paper for 10 minutes.
Add 1 ml of DNA Hydration Solution (Gentra Systems, Minneapolis,
Minnesota, USA), and rehydrate the DNA by incubating at 65 ° C for
1 hour,
overnight at room temperature, and at 65 °C 1 hour the next day. Tap
the tube
periodically to help disperse the DNA. The DNA in solution can be stored
indefinitely at 4 ° C.
F~nle 3pi'e°arat;nn nfRto~d Sample With FTATM Parser
A blood sample is drawn from a human. Two gel of blood are placed on a
piece of FTATMpaper (FITZCO, Inc., Maple Plain, Minnesota, USA), dried, and
stored until ready to be processed.
To analyze the polymorphisms in the sample, a 1 mm disc is punched
directly into a 2 ml microcentrifuge tube, and 200 ~tl of FTATM purification
reagent
is placed on the disc. The tube is capped, vortexed for 3-5 seconds, then
centrifuged
in a microcentrifuge at 12,000 x g for 30 seconds. The wash solution is then
aspirated and discarded. The wash is then repeated with another 200 wl of
purification reagent. After the second wash solution has been aspirated and
discarded, the disc is washed twice with TE as follows: 200 ul of TE buffer is
added, and the disc vortexed for 3-5 seconds, the tube and disc are then
centrifuged
at 12,000 x g for 30 seconds and the filtrate removed and discarded. After the
disc
has been washed twice with TE, the disc is subjected to polymorphism analysis.
A 1 mm punch of FTATM paper containing a blood sample, processed as
described in Example 2, supra, is placed in a 0.5 ml tube, and tested with the
AmpFISTR Profiler PIusTM system (Perldn Elmer Applied Biosystems, Foster City,
California, USA), according to the manufacturer's instructions: In general, to
the
tube is added 10.5 wl of Profiler Plus Reaction Mixture, 0.5 ~1 of Taq Gold,
and 5.5
pl of Primer Mixture. The tube is sealed, and placed in a thermocycler under
the
following conditions: 95°C for 11 minutes, followed by 24 cycles of
94°C for 1


CA 02321963 2000-08-23
WO 99/43855 PCTIUS99/04094
-19-
minute, 59°C for I minute, 72°C for one minute. After the 25th
cycle, the reaction
mixture is placed at 60°C for up to 83 minutes. After thermocycling is
complete,
the reaction is held at 4°C until ready for gel electrophoresis.
Five ~1 of amplification product (produced as described above in Example 3}
are mixed with 5 ~l of 2X loading buffer (0.25% Bromphenol Blue, 12.5% Ficoll
400, 50 mM EDTA, SX TAN (IOX TAN: 0.4 M Tris, 40 mM Na Acetate
Trihydrate, 10 mM EDTA, pH to 7.9 with acetic acid)). The 10 ~1 mixture is
loaded
into a well in a 1% agarose gel prepared with TE buffer and containing 0.5 ~g
of
ethidium bromide per ml of agarose gel. Appropriate size ladder is also loaded
on
the gel. The gel is then electrophoresed in TAE buffer for 1 hour at 100
volts, and
then illuminated with W light on a transilluminator, and photographed. The
bands
in the photograph are then compared to the literature supplied by the
manufacturer to
determine the precise alleles present in the sample.
I S FJXample 6' Creation of the Unique identifier
A set of blood samples was prepared and tested with the AmpFISTR Profiler
PIusTM system as described in Examples 2 through 4, and the results are shown
in
Table 2.
Table 2. Alleles found in seven human individuals when tested for eight STR
loci in
the AmpFISTR Profiler PlusTM system (PE Applied Biosystems, Foster City, .
California, USA).
Locus #1 #2 #3 #4 #5 #6 #7


D3S1358 15 14,1515,18 16,17 15,1717,18 15


vWA 15,18 15,1613,14 16,17 15,2017,19 14,15


FGA 19,24 20,2122,24 23 21,2221,23 24,28


AmelogeninX,Y X,Y X,Y X,Y X X X


D8S1179 12,15 13,1512,14 13,15 14 I3 14,15


D21S11 33.2 29,3030,31.229,32 28,3128,33.232.2,38




CA 02321963 2000-08-23
WO 99/43855 PCT/US99104094
-20-
D5S818 12 11,13 8,12 1012 13,1411 12,13


D13S317 12,1311,13 11,12 9,12 12,149,14 12


D7S820 8,9 8,12 10,11 8,11 10,1210,11 9,10


D18S51 11,159,14.213.2,2018,24 9,18 10,12 10,18


The polymorphism data for a sample can be coded in a number of ways. The
raw data for individual #1, for example, is as follows:
D3S1358 lS;vWA 15,18;FGA 19,24;Amelogenin X,Y;D8S1179 12,15;
D21S 11,33.2;DSS818 12;D13S317 12,13;D7S820 8,9;D18S51 11,15
This data can be used "raw" as the unique identifier (i.e., "as is," as
above), with no
alteration. For repositories with very large numbers of samples, this may be
desireable, as it is the most "foolproof' method.
Alternatively, the STR loci can be "coded," i.e., each locus represented by a
combination of numbers or letters, e.g., D3S1358 can be represented by "A" or
~"O1,"
vWA by "B" or "02," etc. The raw data so coded would then be:
A,15,B,15,18,C,19,24,D,X,Y,E,12,15,F,33.2,G,12,H,12,13,I,8,9,J,11,15, or
O
1,15;02,15,18,03,19,24,04,X,Y,05,12,15,06,33.2,07,12,08,12,13,09,8,9,10,11,15
All patents, patent applications, and references cited above are hereby
incorporated by reference in their entirety. While this invention has been
particularly shown and described with references to preferred embodiments
thereof,
it will be understood by those skilled in the art that various changes in form
and
details may be made therein without departing from the spirit and scope of the
invention as defined by the appended claims.

Representative Drawing

Sorry, the representative drawing for patent document number 2321963 was not found.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date Unavailable
(86) PCT Filing Date 1999-02-25
(87) PCT Publication Date 1999-09-02
(85) National Entry 2000-08-23
Dead Application 2003-02-25

Abandonment History

Abandonment Date Reason Reinstatement Date
2002-02-25 FAILURE TO PAY APPLICATION MAINTENANCE FEE

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee $300.00 2000-08-23
Maintenance Fee - Application - New Act 2 2001-02-26 $100.00 2001-01-15
Registration of a document - section 124 $100.00 2001-04-04
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
GENOMICS COLLABORATIVE, INC.
Past Owners on Record
BING, DAVID H.
WILLIAMSON, JANICE M.
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Description 2000-08-23 20 1,094
Abstract 2000-08-23 1 53
Cover Page 2000-12-01 1 34
Claims 2000-08-23 5 166
Correspondence 2000-11-21 1 2
Assignment 2000-08-23 3 114
PCT 2000-08-23 10 321
Assignment 2001-04-04 8 325
Correspondence 2001-05-01 1 2
Assignment 2001-05-15 1 36