Language selection

Search

Patent 2160457 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 2160457
(54) English Title: RANDOM CHEMISTRY FOR THE GENERATION OF NEW COMPOUNDS
(54) French Title: CHIMIE ALEATOIRE POUR L'OBTENTION DE NOUVEAUX COMPOSES
Status: Dead
Bibliographic Data
(51) International Patent Classification (IPC):
  • C12P 1/00 (2006.01)
  • C07B 61/00 (2006.01)
  • C07H 21/00 (2006.01)
  • C07K 1/00 (2006.01)
  • C07K 1/04 (2006.01)
  • C12N 9/00 (2006.01)
  • C12N 15/10 (2006.01)
  • C12P 19/34 (2006.01)
  • C12Q 1/68 (2006.01)
(72) Inventors :
  • KAUFFMAN, STUART A. (United States of America)
  • REBEK, JULIUS, JR. (United States of America)
(73) Owners :
  • KAUFFMAN, STUART A. (United States of America)
(71) Applicants :
  • KAUFFMAN, STUART A. (United States of America)
(74) Agent: RIDOUT & MAYBEE LLP
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date: 1994-04-19
(87) Open to Public Inspection: 1994-10-27
Examination requested: 1995-10-12
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/US1994/004314
(87) International Publication Number: WO1994/024314
(85) National Entry: 1995-10-12

(30) Application Priority Data:
Application No. Country/Territory Date
08/049,268 United States of America 1993-04-19

Abstracts

English Abstract






Methods for the generation of new compounds are disclosed. The present invention eliminates the need to know in advance the
structure or chemical compounds of a compounds having a desired property. The disclosure of the present invention provides that diversity
of unknown compounds may be produced by "random" chemistry, and such a diversity of unknown compounds may be screened for one
or more desired properties to detect the presence of suitable compounds. In one aspect, a starting group of organic compounds is caused
to undergo a series of chemical reactions to create a diversity of new organic compounds that are screened for the presence of organic
compounds having the desired property. In another aspect of the present invention, a diversity of compounds is generated from a group of
substrates which are subjected to a group of enzymes representing a diversity of catalytic activities.


Claims

Note: Claims are shown in the official language in which they were submitted.



- 69 -
Claims

1. A method for the production of an organic molecule having
a desired property, comprising the steps of:
(a) providing a starting group of different organic
molecules;
(b) causing at least one chemical reaction to take place
with at least some of the different organic molecules in the starting
group to create an intermediate reaction mixture having one or
more organic molecules different from the organic molecules in
the starting group;
(c) repeating step (b) at least once by substituting the
intermediate reaction mixture as the starting group to thereby
produce a final reaction mixture as a result of the last repetition;
and
(d) screening the final reaction mixture resulting from
step (c) for the presence of the organic molecule having the
desired property.

2. The method of claim 1 further comprising the step of
isolating from the final reaction mixture the organic molecule having the
desired property.

3. The method of claim 1 further comprising the step of
determining the structure or functional properties characterizing the
organic molecule having the desired property.

4. The method of claim 3 further comprising the step of
synthesizing the organic molecule having the desired property.


- 70 -

5. The method of claim 1 further comprising the step of
adding more of the starting group of different organic molecules to the
intermediate reaction mixture after at least one repetition of step (b).

6. The method of claim 1 wherein the different organic
molecules of the starting group all share a common core structure.

7. The method of claim 1 wherein the different organic
molecules of the starting group are selected from the group consisting of
alkanes, alkenes, alkynes, arenes, alcohols, ethers, amines, aldehydes,
ketones, acids, esters, amides, cyclic compounds, heterocyclic
compounds, organometallic compounds, hetero-atom bearing
compounds, amino acids, nucleotides, and mixtures thereof.

8. The method of claim 7 wherein the different organic
molecules of the starting group are selected from the group consisting of
acids, amines, alcohols, amino acids, nucleotides, and unsaturated
compounds.

9. The method of claim 8 wherein the different organic
molecules of the starting group are selected from the group consisting of
amino acids and nucleotides.

10. The method of claim 1 wherein the at least one chemical
reaction for each repetition of step (b) is independently selected from the
group consisting of substitution, addition, elimination, rearrangement,
dehydration, reduction, oxidation, condensation, hydrogenation,
dehydrogenation, dimerization, epoxidation, isomerization, cyclization,
decyclization, halogenation, sulfonation, alkylation, acylation, nitration,
hydrolysis, esterification, transesterification, carboxylation,
decarboxylation, amination, and deamination.

- 71 -

11. The method of claim 1 wherein the chemical reaction is
caused by changing the conditions of the intermediate reaction mixture,
by taking a step selected from the group consisting of adding water,
removing water, adding air, adding oxygen, adding ammonia, changing
temperature, changing pressure, adding an oxidizing agent, adding a
reducing agent, adding a source of radiation, adding a hydroxylating
agent, adding a hydrogenating agent, adding a dehydrogenating agent,
adding an epoxidizing agent, adding a halogenating agent, adding a
sulfonating agent, adding an alkylating agent, adding an acylating agent,
adding a nitrating agent, adding a hydrolytic agent, adding a
carboxylating agent, adding a decarboxylating agent, changing
concentration, adding a new solvent, changing pH, and adding a
catalyst.

12. The method of claim 1 wherein the at least one chemical
reaction is caused by adding a set of different enzymes.

13. The method of claim 12 wherein at least 10,000 different
enzymes are added.

14. The method of claim 13 wherein at least 1,000,000 different
enzymes are added

15. The method of claim 14 wherein at least 100,000,000
different enzymes are added.

16. The method of claim 1 wherein the conditions causing the
chemical reactions of steps (b) and (c) are the same.


- 72 -
17. The method of claim 1 further comprising the step of using
a selection method on the intermediate reaction mixture to produce a
subset of organic molecules with a higher likelihood of producing the
organic molecule having the desired property.

18. The method of claim 17 wherein the selection method
comprises using a chemostat.

19. The method of claim 1 wherein at least one agent, selected
from the group consisting of oxidizing agents, reducing agents,
hydrogenating agents, dehydrogenating agents, hydroxylating agents,
hydrogenating agents, dehydrogenating agents, epoxidizing agents,
halogenating agents, sulfonating agents, alkylating agents, acylating
agents, nitrating agents, hydrolytic agents, carboxylating agents, and
decarboxylating agents, is added during at least one repetition of step
(b).

20. The method of either claim 1 or claim 7 wherein the starting
group contains at least 10 different organic molecules.

21. The method of claim 20 wherein the starting group contains
at least 100 different organic molecules.

22. The method of claim 21 wherein the starting group contains
at least 1,000 different organic molecules.

- 73 -

23. A method for the production of an organic molecule having
a desired property comprising the steps of:
(a) providing a starting group of different organic
molecules;
(b) causing at least one chemical reaction to take place
with at least some of the different organic molecules in the starting
group to create an intermediate reaction mixture having one or
more organic molecules different from the organic molecules in
the starting group;
(c) repeating step (b) at least once by substituting the
intermediate reaction mixture as the starting group to thereby
produce a final reaction mixture as a result of the last repetition;
(d) screening the final reaction mixture resulting from
step (c) for the presence of the organic molecule having the
desired property; and
(e) if the organic molecule is found in the final reaction
mixture then performing the following additional steps:
(1) dividing the starting group of different organic
molecules into at least two subgroups each containing less
than all of the different organic molecules in the starting
group;
(2) performing steps (b) and (c) on each of the
subgroups in the same way as performed with the starting
group to produce a final reaction submixture corresponding
to each of the subgroups;
(3) screening each of the final reaction
submixtures resulting from step (2) for the presence of the
organic molecule having the desired property; and
(4) repeating at least once steps (1) through (3)
for at least one of the successful subgroups from which the
organic molecule having the desired property is produced


- 74 -
by substituting the successful subgroup as the subgroup in
step (1) to thereby identify a narrowed group of different
organic molecules from which the compound having the
desired property can be produced.

24. A method for the production of an organic molecule having
a desired property, comprising the steps of:
(a) providing a starting group of different organic
molecules;
(b) causing at least one chemical reaction to take place
with at least some of the different organic molecules in the starting
group to create an intermediate reaction mixture having one or
more organic molecules different from the organic molecules in
the starting group;
(c) repeating step (b) at least once by substituting the
intermediate reaction mixture as the starting group to thereby
produce a final reaction mixture as a result of the last repetition;
(d) screening the final reaction mixture resulting from
step (c) for the presence of the organic molecule having the
desired property; and
(e) if the organic molecule having the desired property
is found in the final reaction mixture, then performing the following
additional steps:
(1) providing at least two additional starting
groups of different organic molecules, each additional
starting group corresponding to the starting group of step
(a);
(2) performing steps (b) and (c) on each of the
additional starting groups in the same way as performed
with the starting group of step (a) with the exception that,
for each of the additional starting groups, at least one of


- 75 -
the chemical reactions is eliminated to thereby produce an
additional final reaction mixture from each of the additional
starting groups;
(3) screening each of the additional final reaction
mixtures resulting from step (2) for the presence of the
organic molecule having the desired property;
(4) repeating, at least once, steps (1) through (3)
for at least one of the successful additional starting groups
from which the organic molecule having the desired
property is produced by substituting the successful
additional starting group as the additional starting group in
step (1) to thereby identify a narrowed group of chemical
reactions from which the compound having the desired
property can be produced

25. A method for the production of an organic molecule having
a desired property, comprising the steps of:
(a) providing a starting group of at least 100 different
organic molecules selected from the group consisting of alkanes,
alkenes, alkynes, arenes, alcohols, ethers, amines, aldehydes,
ketones, acids, esters, amides, cyclic compounds, heterocyclic
compounds, organometallic compounds, hetero-atom bearing
compounds, amino acids, nucleotides, and mixtures thereof;
(b) causing at least one chemical reaction selected from
the group consisting of substitution, addition, elimination,
rearrangement, dehydration, reduction, oxidation, condensation,
hydrogenation, dehydrogenation, dimerization, epoxidation,
isomerization, cyclization, decyclization, halogenation, sulfonation,
alkylation, acylation, nitration, hydrolysis, esterification,
transesterification, carboxylation, decarboxylation, amination, and
deamination to take place with at least some of the different


- 76 -
organic molecules in the starting group to create an intermediate
reaction mixture having one or more organic molecules different
from the organic molecules in the starting group;
(c) repeating step (b) at least once by substituting the
intermediate reaction mixture as the starting group to thereby
produce a final reaction mixture as a result of the last repetition;
(d) screening the final reaction mixture resulting from
step (c) for the presence of the organic molecule having the
desired property;
(e) isolating from the final reaction mixture the organic
molecule having the desired property; and
(f) determining the structure or functional properties
characterizing the organic molecule having the desired property.

26. The method of claim 25 wherein the different organic
molecules of the starting group all share a common core structure.

27. The method of claim 26 further comprising the step of
using a selection method on the intermediate reaction mixture to
produce a subset of organic molecules with a higher likelihood of
producing the organic molecule having the desired property.

28. The method of claim 27 wherein the at least one chemical
reaction is caused by adding a set of different enzymes.

29. A method for the production of an organic molecule having
a desired property, comprising the steps of:
(a) reacting a group of different substrates, the group
comprising acids, amines, alcohols, and unsaturated compounds,
under suitable conditions with a dehydrating agent to yield a first
reaction mixture;


- 77 -
(b) reacting the first reaction mixture with a reducing
agent under suitable conditions to yield a second reaction
mixture;
(c) reacting the second reaction mixture with an
oxidizing agent under suitable conditions to yield a third reaction
mixture;
(d) performing a condensation reaction under suitable
conditions upon the third reaction mixture to yield a fourth
reaction mixture;
(e) exposing the fourth reaction mixture to light with a
wavelength of about 220 nanometers to 600 nanometers, thereby
producing one or more organic molecules different from the
substrates and agents;
(f) screening the exposed fourth reaction mixture for the
presence of the organic molecule having the desired property;
and
(g) isolating from the exposed fourth reaction mixture
the organic molecule having the desired property.

30. A method of generating for characterization an organic
molecule having a desired property, comprising the steps of:
(a) reacting a group of different substrates, the group
comprising acids, amines, alcohols, and unsaturated compounds,
under suitable conditions with a dehydrating agent to yield a first
reaction mixture;
(b) reacting the first reaction mixture with a reducing
agent under suitable conditions to yield a second reaction
mixture;
(c) reacting the second reaction mixture with an
oxidizing agent under suitable conditions to yield a third reaction
mixture;


- 78 -
(d) performing a condensation reaction under suitable
conditions upon the third reaction mixture to yield a fourth
reaction mixture;
(e) exposing the fourth reaction mixture to light with a
wavelength of about 220 nanometers to 600 nanometers, thereby
producing one or more organic molecules different from the
substrates and agents;
(f) screening the exposed fourth reaction mixture for the
presence of the organic molecule having the desired property;
and
(g) determining the structure or functional properties
characterizing the organic molecule having the desired property.

31. The method of claim 30, additionally including, prior to step
(9), isolating from the reaction mixture the organic molecule having the
desired property.

32. The method of either claim 29 or claim 30, additionally
including, prior to step (f), repeating steps (a)-(e) with or without
introducing additional substrates.

33. The method of either claim 29 or claim 30 wherein the
order in which the substrates are subjected to the reactions of steps
(a)-(e) is varied.

34. The method of either claim 29 or claim 30, further including
after step (9), producing the organic molecule having the desired
property.


- 79 -
35. The method of either claim 29 or claim 30 wherein the
desired property is the ability to function as a drug, a vaccine, a ligand, a
catalyst, a catalytic cofactor, a structure of use, a detector molecule, or a
building block for another compound.

36. A method for the production of an organic molecule having
a desired property, comprising the steps of:
(a) reacting a group of different enzymes representing a
diversity of catalytic activities under suitable conditions with a
group of different substrates to create a reaction mixture, thereby
producing one or more organic molecules different from the
enzymes and substrates in the reaction mixture;
(b) screening the reaction mixture for the presence of
the organic molecule having the desired property; and
(c) isolating from the reaction mixture the organic
molecule having the desired property.

37. A method of generating for characterizing an organic
molecule having a desired property, comprising the steps of:
(a) reacting a group of different enzymes representing a
diversity of catalytic activities under suitable conditions with a
group of different substrates to create a reaction mixture, thereby
producing one or more organic molecules different from the
enzymes and substrates in the reaction mixture;
(b) screening the reaction mixture for the presence of
the organic molecule having the desired property; and
(c) determining the structure or functional properties
characterizing the organic molecule having the desired property.


- 80 -
38. The method of claim 37, additionally including, prior to step
(c), isolating from the reaction mixture the organic molecule having the
desired property.

39. The method of either claim 36 or claim 37 wherein the
group of different substrates is selected from the group consisting of
alkanes, alkenes, alkynes, arenes, alcohols, ethers, amides, aldehydes,
ketones, acids, esters, amides, cyclic compounds, heterocyclic
compounds, organometallic compounds, hetero-atom bearing
compounds, amino acids, nucleotides, and mixtures thereof.

40. The method of claim 39 wherein the group of different
substrates is selected from the group consisting of acids, amines,
alcohols, amino acids, nucleotides, and unsaturated compounds.

41. The method of claim 40 wherein the group of different
substrates is selected from the group consisting of amino acids and
nucleotides.

42. The method of claim 39 wherein the group of different
substrates contains at least 100 different organic molecules.

43. The method of claim 42 wherein the group of different
substrates contains at least 1,000 different organic molecules.

44. The method of either claim 36 or claim 37, further
comprising after step (c), producing the organic molecule having the
desired property.


- 81 -
45. The method of either claim 36 or claim 37, wherein the
desired property is the ability to function as a drug, a vaccine, a ligand, a
catalyst, a catalytic cofactor, a structure of use, a detector molecule, or a
building block for another compound.

46. The method of either claim 36 or claim 37 wherein the
group of different enzymes comprises at least 10,000 different enzymes.

47. The method of claim 46 wherein the group of different
enzymes comprises at least 1,000,000 different enzymes.

48. The method of claim 47 wherein the group of different
enzymes comprises at least 100,000,000 different enzymes.

49. The method of either claim 36 or claim 37, wherein the
substrates of the group of different substrates all share a common core
structure.

50. The method of either claim 36 or claim 37, further
comprising the step of using a selection method on the reaction mixture
to produce a subset of organic molecules with a higher likelihood of
producing the organic molecule having the desired property.

Description

Note: Descriptions are shown in the official language in which they were submitted.


WO 94/24314 2 ~ ~ O ~ ~ 7 PCTIUS94/04314




RANDOM CHEMISTRY FOR
THE GENERATION OF NEW COMPOUNt)S


Cross-Re~erence to Relate A lication
The present application is a continuation-in-part of U.S.
patent application 08/049,268, filed April 19, 1993, the entire disclosure
of which is hereby incorporated by reference.

Technical Field
The ~resent invention relates generally to the generation of
new compounds without predeLe",)ining a desired structure or
composition, and the screening of such compounds for one or more
desired properties. This invention is more particularly related to the use
of a random chemistry, with or without enzymes, to ge"erate a variety of
new compounds from which those with a desired property may be
cl,aracteri~ed or identified, e.g., for subsequent production in batch
quantities by conventional methodologies or otherwise.

Background of the Invention
Humankind's attempt to acquire new and useful
compounds has been one of the more interesLin~ but problematic
endeavors, especially with respect to medically useful compounds. In
general, the traditional approaches to the acquisition of new compounds
have been either isolation of natural products (i.e., isolation of
compounds found in nature) or synthetic preparation. Discovery of new

WO 94/24314 216 0 ~ 5 7 PCT/US94/04314


and useful compounds via naturai products is hampered by a variety of
problems including the availability of source materials from which to
isolate compounds. Further the variety of compounds via natural
products is not unlimited as plants and other living organisn,s do not
make every compound theoretically possible.
All~r"~ ely new compounds have been prepared
synthetically i.e. by the creation of compounds in the laboratory rather
than through the isolation of naturally occurring compounds. The
synthetic generation of compounds utilizes the principles and
methodologies of organic cl,e,nisl,y especially reaction mechanisms.
Compounds are clealed by deliberate "rational" approaches in which the
structure of a desi,ed compound is first determined or conceived and a
synthesis strategy is then developed. This approach appears to be
reaching its zenith in the field of drug design where computer-~ssisted.
structure-activity stu~lies are performed to generate rational drug design.
However in general rational drug design has not achieved the
successes initially envisioned.
Once a desired compound structure is identified a strategy
for synthetic preparation is developed. Traditional st,alegies for organic
sy,lti,esis are serial sy"ll,esis or assembly of subunits in parallel, or a
combination II,ereof. Serial synthesis involves the modification of one
compound to form another compound which in turn is chemically
I,a"s~or",ed (and so on) until the desired compound is synti,esi~ed.
Assembly of subunits in parallel involves the sy"l~,esis of "portions" of a
desired compound with production of the desired compound resulting
from the joining of the individual portions.
More specifically current techniques to sy"ll ,esi~e desired
compounds through a sequence of catalyzed reactions are based on the
control of each reaction step in a sequential synthesis to optimize the
yield of each intermediate compound used along the pathway of
synthesis of the s~ecific dedr~d termi~ ~roduct compound. The logic

~ WO 94124314 21~ 0 ~ ~ 7 PCT/US94/04314


of this established procedure rests on the fact that the structure of the
desired le""inal product molecule is known berorel)and, and that a
thermodynamically efficient reaction pathway leading from substrates to
the desired product exists. As noted above, two major ge,)eral
strategies to synthesize a desired target compound are common in the
art. In the first, the terminal target is built up sequentially by successive
modification of a starting subsl,ale, acted upon in conjunction with other
possible suL,sl-ales, either by enzymes or careful choice of reaction
conditions. A simple example is the sequential chemical synthesis of a
desired peptide by cycles of protection and deprotection of the growing
peptide chain as a succession of activated amino acids is added one by
one. A second major allernali~e strategy in the art is the synthesis of a
desired cl,emical compound by the s~ccessive sy"ll,esis of increasingly
complex sets of building blocks which are finally joined to make the
desired target. A simple example is the synthesis of a specific
hPY~rertide (ABCDEF) from the amino acid mGnG"~ers A, B, C, D, E, F,
by the sy,lti,esis of the dipeptides AB, CD, EF, then the joining of the
dipeptides to form the hexapeptide. The same two general strategies
are utilized in many areas of synthetic chemistry with a variety of
di~dre,ll organic compounds. Both slrateyies are hindered by a variety
of problems, including the necessity for knowledge about, and use of,
prespecified reaction pathways.
In summary, current approaches to the ~c~ sition of new
and useful compounds are subject to a variety of limitations. Thus, there
is a need in the art for a method for ge neralil~g new compounds without
the necessily for predeler"~ining chemical structures, compositions, or
synthesis pathways. The present invention fulfills these needs and
further provides other related advantages.

WO 94/24314 21~ ~ 4 5 7 PCT/US94/04314 ~


Summar,v of the Invention
In contrast to the current approaches to the ~ctluisition of
new and useful compounds, the present invention el;",inales the need to
know the structure or chemical composition of the desired compound
prior to its synthesis. The disclos~re of the present invention provides
that a diversity of unknown compounds may be produced by "random"
chemistry, and such a diversit,v may be screened for one or more
desired properties to detect the presence of suitable compounds. It is
central to the subject methods that one does not need to know in
advance the structure or composition of the useful compound sought.
Briefly stated, the present invention provides methods for
the production of an organic molecule having a desired property, or for
the generation and chara~Aeri~alion of an organic molecule having a
desired property.
In accolda"ce with a first aspect of the present invention,
the method comprises first providing a starting group of different organic
moleclJles. At least one chemical reaction is cAused to take place with
at least some of the clirrere"l organic molecules in the starting group to
create an intermediate reaction mixture having one or more organic
molecu'es dir~-e~l from the organic molecules in the starting group.
This step of causing at least one cl ,e"~ical reaction to take place is
repeated at least once. Each repetition uses the reaction mixture of the
previous step, and in the end prod~ces a final reaction mixture as a
result of the last repetition. The final reaction mixture is screened for the
I.resence of the organic molecule having the desired property.
In accordance with an embodiment of the first aspect, the
method for the production of an organic molecule having a desired
property as described above is performed. If the screening step of this
aspect is successr.ll in detecting the organic molecule having the
desired propert,v in the final reaction mixture, then the following
A~J~tiol ,al s~ ue pe~med. n~ starting group of different organic

WO 94/24314 2 ~ 6 0 4 ~ 7 PCT/US94/04314

- 5 -
molecules is divided into at least two subgroups, each containing less
than all of the different organic molecules in the starting group. The
chemical reactions are performed on each of the subgroups in the same
way as with the starting group to produce a final reaction submixture
cor.es,~)~nding to each of the subgroups~ Each of the final reaction
submixtures resulting from this step is screened for the presence of the
organic molecule having the desired property. These additional steps
are repeated at least once for each of the sl~ccess~ul subgroups from
which the organic molecule having the desired property is pro~uce~ by
substit~lting the successtul subgroup as the subgroup in the first
ad~itional step to thereby identify a narrowed group of dir~ere"t organic
moleu ~les from which the compound having the desired property can be
prod~ ~ce~
In one embodiment, the method comprises the steps of:
(a) reacting a group of dirrerent sul~slrales, the group
comprising acids, amines, alcohols, and unsaturated compounds, under
suitable conditions with a dehydldtin5~ agent to yield a first reaction mixture;(b) reacting the first reaction mixture with a reducing
agent under suitable conditions to yield a second reaction mixture;
(c) reacting the second reaction mixture with an
oxidizing agent under suitable conditions to yield a third reaction
mixture;
(d) ,uel~o~ ing a concle,lsalion reaction under suitable
conditions upon the third reaction mixture to yield a fourth reaction
mixture;
(e) exposing the fourth reaction mixture to light within a
wavelength of about 220 nanometers to 600 nanometers, thereby
producing one or more organic molecules different from the substrates
and agents;
(f) screening the exposed fourth reaction mixture for the
presei-ce ot an organ c mdec~ he desired property; and

WO 94/24314 21~ ~ 4 ~ 7 PCT/US94/04314 ~

- 6 -
(g) isolating from the exposed fourth reaction mixture
the organic molecule having the desired propertv.
In an alternative embodiment, any subset of steps a-e
above may be performed in any order prior to steps f and g. Further,
steps a-e or any subset of these may be repeated in any order prior to
steps f and g. Similarly, exposure to other reagents, singly, sequentially,
or simultaneously, may be substituted for steps a-e, prior to steps e and
f.
In another embodiment of the first aspect, the method
comprises the steps of:
(a) reacting a group of dir~e"~ subsl-a~as, the group
comprising acids, amines, alcohols, and unsaturated compounds, under
suitable colldiliGns with a deh~ aliny agent to yield a first reactio
mixture;
(b) reacting the first reaction mixture with a reducing
agent under suitable conditions to yield a second reaction mixture;
(c) reacting the second reaction mixture with an
oxidizing agent under suitable conditions to yield a third reaction
mixture;
(d) performing acondensaLion reaction under suitable
conditions upon the third reaction mixture to yield a fourth reaction
mixture;
(e) exposing the fourth reaction mixture to light within a
wavelength of about 220 nanometers to 600 nanGI"elers, thereby
2~ producing one or more organic molecu~es di~re"l from the SUb~lrdl~S
and agents;
(f) screening the exposed fourth reaction mixture for the
presence of an organic molecule having the desired property; and
(g) determining the structure or functional properties
characte,i~i"y the organic molecule having the desired property.

~ WO 94/24314 ~16 0 4 5 7 PCT/U594/04314


Any subset of steps a-e above may be performed in any
order prior to steps f and 9. Further, steps a-e or any subset of these
may be repeated in any order prior to steps f and 9. Similarly, exposure
to other reagents, singly, sequentially, or simultaneously, may be
substituted for steps a-e, prior to steps e and f.
In accordance with a second aspect of the present
invention, the method comprises the steps of:
(a) reacting a group of dirrele"l enzymes representing a
diversity of catalytic activities under suitable conditions with a group of
clifrere,lt substrates, thereby producing one or more organic molecl~les
.li~relenl from the enzymes and subsl.ales in the reaction mixture;
(b) screening the reaction mixture for the presence of an
organic molec~b having a desired property; and
(c) isolating from the reaction mixture the organic
molecule having the des;led property.
In one embodiment of the second ~spect the method
comprises the steps of:
(a) reacting a group of dirrere,1l enzymes replese~ y a
diversity of catalytic activities under suitable conditions with a group of
dirrere"l sul)~lrdles, thereby producing one or more organic molecllles
dirre,e"l from the enzymes and subslrales in the reaction mixture;
(b) screening the reaction mixture for the presence of an
organic moleulle having the desired property; and
(c) dele" "ining the structure or functional properties
characterizing the organic molecule having the desired property.
Other aspects of the invention will become evident upon
rererel,ce to the following detailed description.

WO 94/24314 :~16 0 ~ 7 PCT/U594/04314


Detailed DescriPtion of the Invention
In the first aspect of the present invention, the method
comprises first providing a starting group of cJifferel,l organic molecules.
At least one chemical reaction is c~lsed to take place with at least some
of the different organic molecules in the starting group to create an
intermediate reaction mixture having one or more organic molecules
difrere"l from the organic molecules in the sla,li~,g group. This step of
causing at least one chemical reaction to take place is repeated at least
once. Each repetition uses the reaction mixture of the previous step,
and in the end produces a final reaction mixture as a result of the last
repetition. The final reaction mixture is screened for the presence of the
organic molecule having the desired property.
As noted above, in another aspect, a diversity of
compounds is generated from a group of substrates which are subjected
to a group of enzymes represeulirlg a diversity of catalytic activities. In
still ~"~tl,er aspect of the ,uresenl invention, a diversity of compounds is
ge"erdled from a group of subslrales which are subjected to a variety of
conditions, in the absence of enzymes. An embodiment of either aspect
utilizes a group of substrates with dirrere"l core structures. Another
embodiment of either aspect utilizes a group of sut,sLra~es with similar or
identical core structures, but a variety of differellt functional groups as
substituents. The latter embodiment permits the creation of a diversity of
compounds centered around a particular compound or a particular class
of compounds.
The methods of the present invention are employed to
generale new compounds having a desired property. Examples of
,I~refer,ed desired properties include the ability to function as drugs,
vaccines, liganding agents, catalysts, catal,vtic cofactors, structures of
use, ~letector molecllles, and building blocks for other compounds. A
liganding agent may bind, for example, to protein, DNA, RNA,
carbohydrate, enzyme, receptor, or membrane. Liganding agents

~ WO 94/24314 216 0 4 5 ~ PCT/US94/04314


include agonists and antagonists, such as competitive inhibitors of
enzymes or hormones. Structures of use include low energy structures
(e.g., structures capable of self assembly) and material structures, like
silk. Detector molecules include compounds having optical reporter
properties of inleresl. A new compound may mimic, mod~ tei enhance,
antagonize, modify, or simulate a substance. Specific molecules of
i"lerest include molecules: (1) able to bind to a helper T cell receptor of
specific clones of helper T cells (e.g., such binding leads to amplification
or leletion of specific helper T cell clones); (2) able to be incorporated
into DNA or RNA in place of normal nucleotides (e.g., such incorporation
alters biological activity); and (3) able to act as a sub:,L,ale for an
enzyme or modify the activity of an enzyme (e.g., may modify the
binding activity of a biological molecule). Such molecl~les are useful for
a variety of diagnostic and therapeutic purposes. Other specific
molecules of in~eres~ include oral contraceptives and molecules with
improved properties over analgesics like naproxen, or protease inhibitors
like captopril, antitumor agents like mitomycin, antibiotics like
vancomycin, and antifungals like amphotericin.
Sul.~lrales for the processes described herein include all
organic compounds. A prer~r,ed group of suL,slra~es includes alkanes,
alkenes, alkynes, arenes, alcohols, ethers, amines, aldehydes, ketones,
acids, esters, amides, cyclic compounds, heterocyclic compounds,
organometallic compounds, hetero-atom bearing compounds, amino
acids, and nucleotides. A more preferred group of subslra~es incl~ es
2~ acids, amines, alcohols, amino acids, nucleotides, and unsaturated
compounds, such as alkenes and alkynes. The most prerer.ed group of
subsL,ales is amino acid-based compounds (e.g., amino acids, peptides
and polypeptides), nucleotide-based compounds (e.g., nucleotides and
nllcleosides), and combinations thereof. These substrates may include
additional functional groups as substituents and may be acyclic, cyclic,
and heterocyclic in nature. The acids, amines and alcohols can be

WO 94/24314 216 0 4 5 7 PCT/US94/04314

- 10-
primary, secondary, carboxylic, phosphoric, sulfonic, aromatic,
heterocyclic, aliphatic, etc. For increased reactivity, primary amines and
alcohols are prerer,ed.
An aller"dli~/e to the selection of clir~erent substrates with a
wide variation in their overall structures is to choose subsLrates that
include compounds which are clir~erenl but share one or more common
structural features with a molecl~e of i.,le,est or a class of molecules of
inleresl. Thus, the diversity of compounds to be ge"erated would be
created around a molecule of interest or a class of molecules of inle~esl.
For example, a ringed compound, such as a steroid, may be selected
and then a variety of dir~ere, ll derivatives obtained. Derivatives include
the addition and/or deletion of functional groups, and acyclic
compounds with ringed s~lbstituents similar to a portion of the original
cyclic compound. Such derivatives are subjected to the la~ldoll,
cl,e,l,islly processes described herein to generaLe a greater diversity,
from which a compound having a desired property may be ~letected for
further chara~,1t;ri~dlion, with or without issl~tion. For example, a group
of subsllates COI)Si~lS of related compounds, which are then subjected
to the methods without enzymes as described herein. Alltr"dli~ely, a
group of sul~slrales co~,sisls of related compounds plus reagents, which
are then subjected to the methods with enzymes as described herein. A
variation upon these embodiments of the present invention is to
yel~erale derivatives using the random chemistry processes described
herein, and then subject such derivatives to these processes to generate
a g,ealer diversity, from which a compound having a desired property
may be ~etected for further characleri~alion, with or without isolation.
Classes of molecl~les, which are pre~ened focal points from
which to obtain derivatives to serve as substrates, include heterocycles,
steroids, alkaloids, and peptides/mimetics (including cGnslrained
molecu'es, e.g., constrained by S-S disulfide bonds). Examples of
heterocycles include purines, pyrimidines, benzodiazepins, beta-lactams,

~ WO 94/24314 21~ 0 ~ S 7 PCT/US94/04314

- 11 -
tetracyclines, cephalosporins, and carbohydrates. Examples of steroids
include estrogens, androgens, co,liso,)e, and ecdysone. Examples of
alkaloids include ergots, vinca, curare, pyrollizidine, and mitomycines.
Examples of peptides/mimetics include insulin, oxytocin, bradykinin,captopril, enalapril, and neurotoxins (e.g., from snails, snakes, etc.).
In one aspect, the present invention provides methods for
gene~alion of new compounds wherein a group of subslrales are acted
upon by a group of "enzymes," such that a diversity of product
molecules are formed. As used herein, the term "enzyme" incl~ ~des
enzymes (e.g., naturally or non-naturally occurring or produced),
catalysts (e.g., catalytic surfaces), candidate catalysts and candidate
enzymes (e.g., antibodies, RNA, DNA or random peptides/polypeptides).
In one embodiment, the method comprises the steps of: (a) reacting a
group of dif~e,e,)l enzymes representil)g a diversity of catalytic activities
under suitable conditions with a group of different subslrales, thereby
producing one or more organic molecules dirre~e"l from the enzymes
and subslrales in the reaction mixture; (b) screening the reaction mixture
for the ,~.rese"ce of an organic molecule having a desired property; and
(c) isolating from the reaction mixture the organic molec~ ~le having the
desired property.
In a"oll)er embodiment, the method comprises the steps
of: (a) reacting a group of cJirrerel,l enzymes representing a diversity of
catalytic activities under suitable conditions with a group of dirrere,
suL~Irales, thereby producing one or more organic molecules dirrere"l
from enzymes and subslrates in the reaction mixture; (b) screening the
reaction mixture for the presence of an organic molecule having the
desired property; and (c) determining the structure or functionalproperties characte,i~i,)g the organic molecule have the desired
property.
From a library of product moleuJles produced by the
methods provided herein, those of practical interest are characterized.

WO 94/24314 21~ 0 ~ ~ 7 PCT/US94/04314 ~

- 12-
As noted above, it is central that, in the present procedures, one does
not need to have prior knowledge of the structure or composition of the
useful molecule sought. This aspect of the present invention rests on
catalysis of, or otherwise causing, a sufficient diversity of reactions
among a group of initial substrates, such that a diversity of further
products are formed. In order to more fully appreciate the diversities of
products which may be generaled by the methods of the present
invention, it may be helpful to consider a st~tistic~l analysis of ~he
average properties of reaction graphs among a set of moleu~les, as well
as the average properties of the catalyzed reaction subgraph among
these molecules which is formed when the molecules are incub~ted in
the presence of candidate enzymes or catalysts which may catalyze one
or more of the reactions.
A reaction graph is the proper mathematical description of
a set of organic molecules and all the reactions that those molecules
can undergo. Organic reactions can be categorized into cl~sses by the
number of sub~t.ale and number of product molecule species. A first
class l,a"~ror",s a single substrate into a single product. An
isomerization reaction, catalyzed by an isomerase, is an example. A
second class joins two su~slrales to form one product. A dehydration
reaction joining two nucleotides by an ester bond, is an example. Such
reactions are commonly catalyzed by ligases. A third broad class
cleaves one subsl,ale into two products. Cleavage of a polynucleotide
by a p hos~l ,odiesterase is a familiar example, as are many steps in
inler",e~3iate metabolism. Finally, a fourth class t,a,-sror",s two
subsl-dLes into two products. Often this occurs by t.a":,rer of a reactive
group from one of the two initial subsl.ates to the second subsl.ate.
A convenient replesenlalion of a reaction graph denotes
each organic molecule species as a point in three dimensional space.
One or two lines lead from the one or two substrate molecules derived
from the reaction of the substrates. Arrows on the lines leaving the

~ WO 94/24314 216 0 4 5 7 PCTIUS94/04314

- 13-
substrates point into a box denoting the reaction. Arrows leaving the
reaction box for the products point toward the products. Since reactions
are reversible, the arrows merely indicate one possible direction of the
reaction. The set of all such arrows and boxes, representing all the
reactions among all the organic molecules in the system, comprises the
reaction graph.
An important feature of reaction graphs is that, for almost
any initial set of organic molecl ~les, the reaction graph in which that set
is considered as substrales will also require addition of new organic
mo'ecules (i.e., molecules not in the initial set of subslrates) where those
new organic molecules are the products of one or more of the possible
reactions among the initial set of subslrales. In a mathematical process,
called the "growth of the reaction graph", the reaction graph "grows" by
iterations. At the first step, a set of initial sul,slrale molecl l'es is listed.
At the next step, the reaction graph among those s~,bslrates is formed
",a~her"alically, and a seco"J iterate of the reaction graph is formed by
listing both the initial substrates plus any new organic molecule products
of the possible reactions. At the third step, all the possible organic
reaclio,)s a",o"g this now enlarged set of organic molecllles is written
down. This new reaction graph may indicate that still further novel
organic molec~lles are products of the reactions now possible. Over a
succession of iterations of this mathematical graph growth process, the
set of organic molecules incl~ er~ in the graph may increase enormously
compared to the initial set of subslrates. This sllccessive increase is
called "supracritical behavior." Another possible mathematical behavior
of the reaction graph growth processes is that a few new products may
be formed on the first graph growth cycle, and sllccessively fewer on the
s~ccessive graph growth cycles, until no further new product molecules
are generated. Behaviors in which graph growth is limited are termed
"subcritical.~

WO 94/24314 PCT/US94/04314
2~4~ ~
- 14-
lf a set of organic molecl ~les and a set of "enzymes," as
defined herein, are present in reaction conditions allowing the enzymes
to act on the organic molecules, then the natural mathematical
leprese"talion of the total system is the reaction graph, as defined
above, plus an accounting of which enzymes catalyze which reactions.
The latter accounting of enzymes and the reactions catalyzed comprises
the catalyzed reaction subgraph of the reaction graph. This mathematic
re~.rese"lalion is formed by noting, for each candidate enzyme, which
reactions if any it catalyzes. An arrow may then be drawn from that
enzyme to the reaction box representing the reaction catalyzed, and the
arrows into and out of the box representing transformations of
subsL,ate(s) into product(s) can be noted in a convenient way, e.g., by
coloring those arrows "red." The set of all red arrows represents the
reactions which are catalyzed by one or more of the candidate enzymes
present in the system.
Just as the reaction graph itself may be subcritical or
supracritical in its behavior, so too may the catalyzed reaction subgraph
among the organic molecllles. In this case, one considers only the
catalyzed reactions among the initial "founder" sul,sl,ates. The catalyzed
reactions lead to products not in the founder set of substrates. These
new products are available, together with the initial founder set of
substrates, to allow further reactions, some of which might be catalyzed
by the set of candidate enzymes present in the system. Over a
successicn of iterations, this process of catalyzed reaction growth may
increase vastly in diversity, in a supracritical mode. Alternatively, the set
of novel molecules formed via catalyzed reactions may dwindle over
successive iterations of the growth of the catalyzed reaction graph. This
is a form of subcritical behavior.
The total behavior of the system is represenled by the
behavior of the reaction graph plus the catalyzed reaction subgraph,
over iterations. The uncatalyzed reactions represent reactions that occur

WO 94/24314 216 0 4 5 7 PCTIUS94/04314

- 15-
spontaneously. Whether a reaction graph behaves subcritically or
supracritically depends upon the diversity of founder substrates and the
diversity of candidate enzymes present in the system. In addition, the
behavior is dependent upon factors including conce"L,dlions of all
reagents, solubility of the organic molecules, and directions of deviation
from equilibrium across each reaction.
In general, a phase transition from subcritical to
supracritical behavior of a reaction system is governed by the diversity of
organic molecule!s and the diversity of enzymes in the system. Systems
with low diversity of both organic molecllles and enzymes are typically
subcritical. Systems with high diversities of either organic molecules
alone, low diversities of organic compounds together with high
divel~ilies of enzymes, or high diversities of both are typically
su~,ra.;-itical. Systems of organic molecl~les alone without addition of
exogenous enzymes can be supracritical, because the spontaneous
reaction graph is supracritical, or bec~use some of the sul sl,ales or
products are enzymes themselves in the sense defined above. In one of
its forms, the present invention takes advantage of the mall ,emalical
phase transition between subcritical and supracritical behavior to choose
reaction conditions which yield high diversity libraries of organic
molecllles from a founder set of organic molec~es~
The general character of the phase transition from
subcritical to supracritical behavior can be illustrated, by way of a non-
lillliling example, based on the prefer.ed use of a cloned library of
antibody molecu'es as the candidate set of enzymes, and a set of
suLsl,ales which, without loss of generality, can be taken to be peptides
containing mixtures of D and L amino acids and nonnatural amino acids,
or can be taken to be small polynucleotides, or a wide variety of other
organic molec~ ~ies. To illustrate the general character of the phase
transition it is useful to estimate the number of reactions in a reaction
graph with a given number of small organic molecules. In general, the

WO 94/24314 216 0 ~ ~ 7 PCT/US94/04314


number of reactions is not known. However, minimum realistic estimates
are obtainable. For example, a founder set of peptides made of D and L
amino acids and non-natural amino acids, each with 10 amino acids,
may be used as substrates. The number of possible subsLrates is very
large, and given by the number of kinds of amino acids raised to the
tenth power. Any two peptides length ten can undergo transpeptidation
reactions cleaving and excl,a"~ing the terminal amino acid(s)
subsequences at any of the internal peptide bonds of each of the
dec~reptides. Since there are 9 internal bonds in each, any pair of
lec~reptides can undergo 81 such transpeptidation reactions. In each
case, two substrates yield two products. Since any pair of decapeptides
can undergo 81 transpeptidation reactions, it is clearly an uncJereslir"ate
to suppose that the two peptides can undergo only 1 transpeptidation
reaction. But even with this clear undere~li",ale, the number of
reactions in a system with a diversity of N types of peptides is equal to
the number of possible pairs of peptides, hence equal to N squared.
The same general features occur with many classes of organic
mobcl~'Qs undergoing reactions with two sut slrdLes and two products.
For most pairs of organic molecu~es, it is conservative to estimate that
the two can undergo at least one reaction to form two products. Thus,
in general, N squared is a conservative e~limale of the diversity of
reactions in a reaction graph with N kinds of organic molecu!es
Again, as a non-limiting example to illustrate the general
character of the phase transition in catalyzed reaction graphs, a set of
100,000,000 cloned human antibody molecules is used as the set of
candidate enzymes. Based on the statistics of generating catalytic
antibodies (as described below), the probability that a randomly chosen
antibody molecule is able to catalyze a randomly chosen reaction is
between 10-5 and 104 (Pollack et al., Science 234:1570, 1986;
Tramontano et al., Science 234:1566, 1986; Tramontano et al., Proc.
Natl. Acad. Sci. USA 83:6736, 1986; Jacobs et al., J. Am. Chem. Soc.

~ wo 94/243l4 21 6 0 ~ ~ 7 PCT/US94/04314


109:2174, 1987; Pollack and Schulk, Cold Spring Harbor Symposium on
Qu&n(its~li,/e Biology Vol. 52, 1987; Tramontano et al., Cold Spring Harbor
Symposium on Quantitative Biology Vol. 52, 1987). The more
- c~"se,~/ative estimate, 10-8, may be used for illustration. Reaction
systems in which the diversity of subsl,ale molecl~les is varied are
considered, and this diversity noted on the Y axis of the Cartesian
coordi"dle system. Simultaneously, the diversity of the candidate set of
enzymes is varied and noted on the X axis. Low diversity of substrates
and candidate enzymes almost certainly yields subcritical reaction
systems. To be concrete, a system with two subslrales, hence one
reaction, and a single randomly chosen antibody molecule is
considered. The clla,lce that this antibody molecule acts as a catalyst
for any of the four single reactions afforded by the sul)slr-dles is
10-8. Thus, almost certainly, no catalyst for the reaction is present in the
system, and the formation of no novel product is catalyzed. The system
is subcritical. A high diversity of subsl,ates and candidate enzymes will
be supracritical with high probability. A diversity of 1,000 organic
mol~cu'es is incl~b~ted with a diversity of 1,000,000 antibody molecules
in an appropriate reaction vessel. The number of reactions among the
1,000 organic molecl~les is at least, by the conservative estimate,
1,000,000. Each reaction might be catalyzed by any one of the
1,000,000 candidate antibody enzymes, and each antibody has a chance
of one in a hundred million of being able to act as a catalyst for each
reaction. Thus, the exrecte~ number of reactions for which antibody
catalysts are present in the system is 106 X 106/108 = 104. Thus, 1û,000
reactions among the million possible should be catalyzed by one or
more of the antibodies present. Thererore, as these catalyzed reactions
occur, the products of the 10,000 reactions will be formed. Most of
these will differ from the 1,000 substrate molecllles initially present.
Thus, the diversity of the set of organic molecules has increased. After
sufficient time has elapsed for the cGnce~ ,I,dlions of these novel

WO 94124314 21 i~ 0 ~ 7 PCTIUS94/04314 ~


molecl ~les to increase sufficiently, the new system has a diversity of
substrates on the order of 10,000 rather than 1,000, hence, now a
diversity of 10,000 squared reactions are possible among the enlarged
set of subslr~les. The expected number of reactions which now find
catalysts among the antibody molecl ~les is thus 1o8 x 106/1o8 or
1,000,000. Thus, within two reaction steps of the founder set of 1,000
organic molecules, the diversity of organic molecl l'es has increased to
about 1,000,000. Over s~ccessive reaction cycles, diversity will increase.
This is supracritical behavior.
In general, in the X-Y plane, a roughly hyperbolic curve
separates a subcritical regime near the origin, represe, Itil ,9 low
diversities of founder substrates and enzymes, and a supracritical regime
with high diversity of initial subslrales, enzymes or both. With a fixed low
diversity of substldles, the system can cross into the supracritical regime
if a high enough diversity of enzymes is present. Conversely, if the
enzyme diversity is fixed rather low, the system can cross into the
supracritical regime if a high enough diversity of subsl,ates is present.
The actual shape of this roughly hyperbolic curve depends upon the
specific way the number of reactions increases as subsLIate diversity
increases, which in turn depends upon the particular set of organic
mcleclJIQs used as founder subsl-ales. The curve also depends upon
~he distribution of probabilities that antibody molecules cataîyze the
dirr~ren~ reactions arrorded by the founder subsLra~es and their prodllcts.
However, for all these cases, a sufficient diversity of both subslrales and
enzymes leads to supracritical behavior. The diversity of organic
molecl ~les in the system will increase dramatically via the catalysis of
connected webs of reactions leading from the founder set of organic
moleu~les to an increasing diversity of their pro~lucts
It is important to em,chasi~e that a system of substrates
alone can explode into a diversity of products, even in the absence of
exogenously supplied enzymes, if the substrates are present in sufficient

~ WO 94124314 216 0 4 S 7 PCTIUS94/04314

- 19-
diversity and high enough concentrations to interact on a reasonable
time scale. For example, a large diversity will be generaled if the
spGnlaneous reaction graph is supracritical. However, systems with
exogenously added enzymes are preferred.
The human antibody repertoire is used herein as a non-
ilin~ example of a set of candidate enzyme moleclJ~es. As is known
in the art, the combinatorial diversity of human antibody molecu'es due
to genomic lear,d"yement is on the order of 100,000,000 (prior to the
onset of somatic mutation during maturation of the immune response
which further increases potential diversity). As is also known in the art,
antibody molecules can function to catalyze a wide variety of reactions
with a rate (\/max) ~cceleration of three to eight orders of magnitude
compared to the spontal,eous reaction. Such catalytic antibodies are
commonly gel,eraled by immunization of an immune competent animal
with a molecule that is a stable analogue of the transition state of the
desired reaction. MonoclGI)al antibodies are generated from this
immunization, and each is tested for its capacity to catalyze the desired
reaction. Rec~use the stable analogue is similar chemically to the
transition state of the reaction, typically on the order of 5% to 10% of the
",onocloilal antibodies tried are able to catalyze the desired reaction.
Presumably the catalysis rerle~;ts high affinity for the transition state and
lower affinity for sub:,lrales and products.
It is possible to estimate the probability that a randomly
chosel) antibody molecule will be able to catalyze a given, randomly
chosen reaction. The fraction of B cells which respond to immunization
with an arbitrary epitope bearing antigen is on the order of one in a
hundred thousand. B cells which respond to an antigen typically have
modestly high affinities for the antigen to be triggered to divide. Thus,
the probability that a randomly chosen antibody molecule can bind with
modest affinity, 104 M-', to an arbitrary antigen is about one in a hundred
thousand. The monoclonal antibodies used to create catalytic

WO 94/24314 216 0 ~ ~ 7 PCT/US94/04314 ~

- 20 -
antibodies may have undergone further somatic mutation that increased
amnity for the antigen. It is reasonable to estimate the probability that a
Idlldolllly c:l,osen antibody has high affinit,v for an arbitrary antigen is
about 10~ to 10~. -
It is further well known in the art that a cloned high diversity
library (10~ or more) of antibody molecules can be and has been created
in a variety of ways. Thus, such antibody libraries are a non-li,~ iny
example of a high diversity set of candidate enzymes.
The use of a repertoire of human antibodies as a set of
candidate enzymes is a ~.ret~r,ed, but non-limiting example of the sets of
molecules which can advantageously be used as sets of enzymes.
Additional candidate sets include the following:
(1) Libraries of fully ,anclo"l or partially slocha~lic
polynucleotide sequences, DNA or RNA, which, upon translation yield
libraries of fully or partially stochastic pe~liJes, polypeptides or protei.ls.
These libraries can be cloned in prokaryotic or eukaryotic hosts to
amplify the polynucleotide sequences and obtain ~.roleil, products which
col,~lil.lte the candidate enzyme library. Allerllali./ely, the polynucleotide
sequences can be amplified in vitro and translated in vitro to obtain the
candidate enzyme library. If needed, the candidate protein library can
be isolated from other molecular components by means known in the
art. For example, an advantageous means to do so uses libraries of
fusion prolei,ls with stochastic peptides, polypeptides, or proteins fused
Adj~cent to, for example, ubiquitin. Antibodies to ubiquitin allow affinity
purification of the librar,v of fusion protei,ls which then serves as the set
of candidate enzymes.
(2) A library of antibody molecules can be derivatized by
cloning partially stochastic DNA sequences into the hypervariable region
of the antibody molecules. A refinement of this involves cloning such
stochastic sequences into one or more of the complement delel"lining
regions (CDRs), of the antibody molecule. Each CDR has on the order

~ WO 94/24314 21~ 0 ~ 5 7 PCTJUS94/04314

- 21 -
of 5 to 10 amino acids. This modified library is a set of candidate
enzymes.
(3) Partially stochastic DNA sequences or RNA
- sequences can be cloned into a gene encoding any protein, e.g.,
h;;,lo"e 1 or any other protein, to create a fusion protein with the novel
DNA or RNA at one end, or in the middle of the host protein sequence.
The well folded host protein serves as a framework to aid folding and
stability of the cloned sequences. The set of such proiei. Is is a library of
candidate enzymes.
(4) Libraries of DNA sequences in themselves, or RNA
sequences in themselves, co,lslilute libraries of candidate enzymes. The
e~isLence of ribozymes and of DNA sequences able to bind arbitrary
ligands, such as thrombin, show that both kinds of polymers are strong
can~ tes to bind transilioll states and catalyze reactions.
(5) Other libraries of combinatorial molecular diversity,
linear sequences or otherwise, as known in the art, may be used as
candidate catalysts. Sets of known enzymes alone, or together with a
small or large variety of mutant variants of those enzymes, can serve
advantageously as the ca"didale set of enzymes. More specifically, and
as a non-li.)~iling example, the sut~sLrales of i"leresl in generaling a
library of molecules may be D and L amino acids, including "onnal.lral
amino acids, and some small dipeptides, tripeptides, and tetrapeptides
formed of these building blocks. It is known in the art that larger
peptides can be synthesized from amino acids and small peptides using
prote~ses, peptidases, lipases, hydrolases, and eslerases
(Schellenberger and Jakubke, Chem. Int Ed. Engl. 30:1437, 1991).
Thus, such a set of enzymes can be used jointly in a common milieu, or
sequentially, acting on a set of substrates. Further, it is known in the art
that it is possible to select mutant variants of proteases which are able to
alter substrate specificity, or alter catalytic activity in unusual solvents,
such as low water dimethylformamide solvents (Arnold, Proc. Natl. Acad.

WO 94124314 216 0 ~ ~ 7 PCT/US94/04314 ~

- 22 -
Sci USA, in press 1993). Thus, in a preferred embodiment of the present
invention, libraries of mutants of each of a set of enzymes of i"Leresl are
used. For each enzyme, length N, there are 19N one mutuant variants,
and on the order of that number squared of two mutant variants. Hence,
a library of severai million mutant proteins of a given enzyme, obtained
by means known in the art, can be readily prepared. For a diversity of
ten dir~t rent initial enzymes, lip~ses, hydrolases, eslerases, and
proteases, the resulting library of candidate enzymes has on the order of
100,000,000 cJirrere"l protein species, each a candidate enzyme. These
are then incllb~ted with the founder sul.~l,a~e library of inlerest.
Increase of the diversity of candidate proteins from 1,000,000 (described
below in a non-limiting example based on antibody molecl~les) to
100,000,000, tog~her with a maximum 10mg/ml solubility of these
ploteins, implies that product molecu!es will form more slowly. Hence in
a 1,000 microliter volume, it would require about 1 second to geneiale a
1 nano",olar conce"l-alion of a product molecule from saturated
enzymes using a diversity of 1,000,000 candidate enzymes, and about
100-fold longer using a library of 100,000,000 candidate enzymes. The
example of small D and L peptides is non-limiting. Other core building
blocks, carbohydrates, heterocyclic compounds, a variety of ~d~ts,
and otherwise, can be used as the starting library of substrates in all the
methods of the invention.
Where traditional protein-based enzymes are used to effect
a diversity of catalytic activities, such enzymes include oxidore~ ct~ses;
l,~nsrerases; hydrolases; Iyases; isomerases; and ligases.
Oxidoreduct~ses catalyze oxidation and reduction reactions. Examples
of oxidore~uct~qses include dehydrogenases; redlJct~ses; oxidases
(monooxygenases and dioxygenases); and peroxidases. Trar~s~erases
catalyze the transfer of functional groups. Examples of transrerases
include aminotral)srerases (transaminases); phosphotransferases;
pyrophosphokinases; and nucleotidyltransferases (RNA and DNA

2160457
WO 94/24314 - PCT/US94/04314


polymerases). Hydrolases catalyze the hydrolytic cleavage of bonds,
such as ester, glycosyl, and peptide bonds. Examples of hydrolases
include phosphodieslerases; amylases; proteases (peptidases,
~roLei.,ases); nucle~ses (exo- and endo-; ribo- and deoxyribonucle~ses);
and phosph~l~ses Lyases catalyze double bond formation by non-
hydrolytic removal of groups from subsLrales. Examples of Iyases
include decarboxylases; anhydrases; and sy"Ll,ases. Isomerases
catalyze geometric or structural changes within one molecule. Examples
of isomerases include racemases; epimerases; tautomerases; and
m~ ~t~ses. Ligases catalyze the joining together of two molecules
coupled with the hydrolysis of pyrophosphate bond. Examples of
lig~ses include synthet~ses
Generation of useful high diversity libraries requires that the
suL,slrales be soluble in the solvent, that the candidate enzymes be
soluble in the solvent, that the volume be sufficiently small and
c~"ce"trdLions sufficiently high that subslrdtes and enzymes encounter
one arlulller rapidly, and at high enough co"cenL,dLions to occupy a
sufficient fraction of enzymatic sites to enhance reaction velocities, and
that the high diversity product library be present in high enough
cGIlcel llrdLions that useful molecu~es can be rletected. All these
requirements have been considered for the present invention. For
example, enzymes typically can tolerate some percenLage of organic
solvents such as ethanol, methanol, dimethyl sulfoxide (DMSO),
dimethylformamide (DMF), or combinations thereof, in aqueous (water
based) solutions (Gupia, Eur. J. Biochem. 205:25, 1991). Thus, where
not all the substrates are water soluble, it is desirable to include water-
miscible organic solvents.
SubsLr-dLes of the types indic~ted vary in solubility. In
general, it is reasonable to obtain millimolar conce"L,dLions of on the
order of 1,000 substrate species in small reaction volumes, on the order
of 1 to 100 microliters. Under reaction conditions such that the diversity

WO 94/24314 21 ~ 7 PCT/US94/04314

- 24 -
of these 1,000 sub~L-ates increases by a factor of 1,000,000, yielding
1,000,000,000, or a library of small molecules with a diversity of 10~, the
average co"ce"~ralion will have fallen by a factor of 10~, hence have
fallen from millimolar to nanomolar, 10~ M. The detection methodologies
discl ~ssed below to identify a molecule of interest are able to detect
readily in the ,a"Gi"olar range, and typically are able to detect in the
picomolar, 10-l2 M, range. Thus, even with a 1,000-fold decrease in
co"ce, It.alions of some products below the mean when diversity is one
billion, the detection means can detect molecllles of i"~elesl. Other
detection procedures allow detection at 10 t5 to 10 20 molar.
The diversity of candidate enzymes in a reaction mixture is
limited by the solubility of the enzymes. For example, for proteins in
~lueolJs media, a 10mg/ml col)cenl,alion is typically attainable. For
candidate enzymes with 200 amino acids, on the order of 10X protein
molecules can be in solution in 1,000 microliters. Thus, if a diversity of
106 candidate enzymes is used, each will be present in 1014 copies.
Catalytic antibodies are of modest efficiency, as noted. Using a turnover
number of 1 per second, 1014 saturated enzymes would yield 10
product molea~les in 1 second. In a 1,000 microliter volume the
col,ce"l-alion of the 10'4 product molecules would be on the order of 0.1
micromolar. Even if solubility limits were 1 mg/ml of enzymes, then
col,ce,)l,alions would decrease by only one order of magnitude. Thus,
high diversities of substrates and candidate enzymes can be mixed
under reaction conditions which yield a high diversity of products via a
catalyzed web of reactions on practical time scales.
In an embodiment of this aspect of the present inven~ion, a
group of enzymes representing a diversity of catalytic activity are
separated in part or in entirety from one another and the subsl,ales
contacted sequentially. For example, a group of enzymes are separated
by membranes, such as dialysis bags, or by immobilizing dirrere"L
enzymes (represellling dirr~re~1l catalytic activities) on solid supports,

~ WO 94/24314 21~ 0 4 5 7 PCT/US94/04314


such as resins. Candidate enzymes can be localized on phage, using
phage display libraries as is known in the art, or other means to
ge"e,ale and display combinatorially diverse libraries of molecules, such
as peptides or other molecules on beads, or surfaces, or polysome
trapped peptide libraries. Additionally, candidate enzymes may be
contained within or displayed upon one or more types of eukaryotic or
prokaryotic cells, the cells and the substrates being brought into contact.
In any case, a group of subsl,ates is circulated (e.g., by peristaltic
pump) through the separated enzymes. For example, substrates are
circul~ted in and out of dialysis bags with pore sizes which prevent
escape of the enzymes. Subslrates are bound by the first set of
enzymes, modified, rele~sed and ciru~l~te~l to the next set of enzymes.
Aller"dli-/ely, a group of subslra!eS may be co"ri"ed and enzymes
having one or more catalytic activities circl ~ te~l through sequentially. In
genelal, the reactions are cond~cted over a period of several hours at
temperatures of about 37C or below. Cofactors such as ATP, NADH, 2
and CoA are added where appropriate. Many of the cofactors may
either be added directly or generated in situ. For example, 2 may be
introduced by injecting the gas or air directly into the reaction mixture or
by use of an electrode to ge"erale 2 An electrode need not directly
contac;t a reaction mixture, but rather may be introduced into a
compartment from which 2 may pass to the reaction mixture. For
example, an electrode may be placed inside of a dialysis bag which in
turn is surrounded by a dialysis bag containing a set of enzymes. It will
be readily appreci~ted by those of ordinary skill in the art that a group of
subsLrales may be subjected to the various separated enzymes in a
variety of orders. Further, it will be evident that after subjecting a group
of subsl-ales to the various separated enzymes, one or more steps may
be repeated if desired. The repetition of steps need not be in the order
initially performed and additional substrates may be intro~ ced at any
step if desired.

WO 94124314 21~ 0 ~ 5 7 PCT/US94/04314 ~


In another embodiment of the present invention, a
combinatorial library of organic molecules or other mo'~cu~es, which are
similar to an initial molecule of inlelest, are generated by derivatization of
the initial molecule in a very large number of possible ways to produce a
high diversity library of "local" mimics of the initial molecule of i"~eresl.
Within the pr~sen~ invention, ~vo ways are provided for generating such
a library, one which does not use enzymes, but uses a variety of
possible ~d~ ~cts or other molecules which may undergo reactions with
the initial mol~cl~e of inleles~, and also uses a variety of chemical
reagents and physical conditions to drive the synthesis of a library of
derivatized products of the initial molecule. Aller"dLi~ely, the core initial
molecule plus a set of candidate adducts and other molecules which
may react with the initial molecule are used, but also incll ~ded is a set of
enzymes which may increase the rate of formation of the local high
diversity library of derivatked forms of the initial compound. Based upon
the prese"l d;sc~osllre provided herein, it will be readily appresi~ted by
those of ordinary skill in the art that the methods ~or producing general
high diversity libraries of product molec-~les and for producing local high
diversity libraries of derivatized forms of an initial compound may be
combined. For example, a new initial compound may be generated by
the general procedure (e.g., subsllates with dir~ere,ll core structures).
Such a new compound is then used, with or without derivatives, to
g~"~rale a local high diversity library of derivatized forms of the
compound. Further, it will be evident to those of ordinary skill in the art
that libraries may be generated using a combination of the methods
herein without enzymes and the methods herein with enzymes.
Generation of a high diversity library of derivatized forms of
a steroid hormone core, such as estrogen, is used as a non-limiting
example. A set of reactants, including estrogen and a variety of other
small molecules which are candid~tes to react with estrogen to form new
product molecl~les partially or entirely containing the steroid core, are

~ WO 94/24314 216 0 4 5 7 PCT/US94/04314


7ed in a common reaction milieu. These are reacted in the p~esence
of a set of enzymes to catalyze the reactions arro~ded by the system.
Enzymes can be chosen by a number of means, some known in the art,
others specified herein. The formation of a library of derivatized
molecu!es under these reaction conditions can be ~ssessed by a
number of means known in the art. For example, the steroid core may
be radioactively l~heled at a varietv of posiliol1s. Tl,erearler, the reaction
mixture can be subjected to HPLC analysis, mass spectrograph analysis,
or other modes of analysis to test for the diversity of molecl ~les which
are labeled. All radio~ctively labeled moleuJles contain atoms derived
from the steroid core, hence the new molecule species are at least
partially comprised of the steroid core. If it is desirable to assure that a
large part of the steroid core is contained in the novel species, then two
or more di~lil ,cL radioactive labels can be used to label distinct and
distant atoms in the core. Simultaneous presence of all labels suggests
strongly that those pGI lions of the steroid core are intact. Aller"alives to
radioactive labels include isotope labels and other means known in the
art. The high diversity library is tested (e.g., by means described herein)
to Jeler")i,le if it contains molecllles of inleresL. If such molec~les are
detecteli~ they may then be isolated by a variety of means, including sib
selection as described herein.
The detection of molecllles which are candidates to act as
antagonists of estrogen is ~isc~ ~ssed first as a non-limiting example of
detection of one or more molecl~es of i"le,esl in the library of this
eslroge" example. Detection of molecules which have higher affinity
than estrogen for the estrogen receptor (and hence which may be of use
in hormone replacement therapy at lower concenL,aLions and thus lower
side effects than estrogen itselfl is f~iscllssed as a second non-limiting
example. Detection of candidate antagonists in the reaction mixture may
be accomplished by use of very high specific activity radioactive
estrogen bound to receptors by means known in the art. Unlabeled

WO 94/24314 216 0 4 ~ 7 PCT/US94/04314 ~


competitors in the library will displace the labeled estrogen, and this
competitive interaction can be ~ietected by loss of label.
Detection of candidate high affinity agonists for
replacement therapy may be carried out by use of appropriate cell
assays similar to the frog melanocyte assay or the use of pH changes
described in detail herein. Presence of a high affinity agonist in the
reaction mixture is demonstrated bec~l~se a very low co"ce,ltrdlion of
the agonist compared to estrogen suffices to trigger the cell response.
Such assays may be carried out in the absence of estrogen, or in the
prese"ce of increasing co"ce"l,aLions of estrogen. In the latter case,
cell r~sponse at lower co"cent.alions of estrogen than would elicit a
response with estrogen alone, detects the presence of an agonist in the
high diversity library. If the agonist can act alone to trigger the cell
response, then during the sib selection winnowing procedure, as its
col,ce,-t-alion increases the tl,resl,old level of eslrogen required for
triggering a cell response will dwindle.

The creation of a set of candidate enzymes able to catalyze
reactions derivatizing the core molecule, e.g., estrogen, is carried out by
selecting from a large set of enzymes ffor example the mouse or human
immune repertoire), a subset of candidates which bind to the initial set of
sul~lrates, the core molecule plus the candidate sub~lrates which are to
react with the core and derivatize it. This set of enzymes may then
advantageously be enlarged by generating a mutant variant spectrum of
each. The purpose of this step is the following: The enzymes have
been sele~ted bec~ ~se they bind to the s~ bslrales of the potential
reactions, rather than selectively binding the transition states of the
reaction. Generation of mu~ant spectra around each such initial enzyme
which binds substrate(s) increases the probability that the mixture of
enzymes will include candidates which bind the transition state of the
reaction, hence are improved candidates to catalyze the reaction.

~ WO 94/24314 21 ~ O ~ ~ 7 PCTIUS94/04314

- 29 -
In a repetitive procedure, a sl ~ccession of candidate
enzymes can advantageously be selecterl as candidates to catalyze the
s~ccescio,~ of reactions steps leading away from core molecule for
example the steroid core, and the initial ~ cts, to generate the high
diversity library. At each reaction cycle with a given set of molecl~le, an
enlarged set of molecules, many derivatized forms of the core molecule,
will be generated. In order to find further enzymes to catalyze the next
reactions a~rded by the enriched reaction system to create still further
derivatized molecules of the core, it is advantageous to select from a
high diversity library of candidate enzymes, new can~ tes which may
act on the newly formed species of product moleu~les These new
candidate enzymes plus their mutant spectra, as well as the previously
identified candidate enzymes, may be used in the s~bse~luent reaction
cycle to catalyze the fo~ dLion of still more kinds of derivatized forms of
the core molecule. Given limited enzyme solubility, in order to keep the
co"ce"lrdlions of critical enzymes as high as possible, it can be
advantageous to utilize only the newly identified candidate enzymes,
plus their mutant spectra identified from the high diversity library of
candidate enzymes, plus the set of candidate enzymes from the last
cycle or few cycles of the reaction sequence leading from the core
molecule and initial ~-~duch. In CGIllrdSl, candidate enzymes leading
from the initial core and initial ~ducts, can be advantageously
eli.ninaled in later iterative steps, since they have already acted to
catalyze formation of their prorlucts.
For example, one means to identify such further enzyme
candidates at each iterative step cGI)si~ls in labeling the substrate and
the product molecules in the reaction mixture, each at a variety of
positions, with radioactive iodine. The purpose of labeling a variety of
positions on each compound with iodine is to assure that the iodine
labeling of at least some members of that species of compounds will not
prevent binding of candidate enzyme moleulles at almost any

WO 94/24314 216 0 ~ 5 7 PCT/US94/04314 ~

- 30 -
compound site ur~ ,deled by the iodine label. These labeled molecules
are then reacted with the high diversity of candidate library enzymes, for
example with human antibody moleu~les, to detect which antibody
molecules bind the labeled molecules from the reaction mixture. This
set incl~ Ides antibody molecules which bind the novel product molecules
created in the reaction system. The antibody molecules plus their
mutant variants are then used to enlarge the set of candidate enzymes.
A variety of means are known in the art to identify the
antibody mo'eclJles which bind iodine labeled molecules in the reaction
mixture. Among these, it is advantageous to use plaque assays or cell
assays ex~,ressing the antibody library to test which plaques or cells
bind iodine î~heled material. If a fluoresce,)l label is used instead of
iodine, it is advantageous to make use of the natural display of antibody
mo'Qc~les on cell surfaces of immortalized B cells, where each such
monoclonal antibody producing cell displays its unique antibody. It is
then advantageous to e~pose the population of cells to the fluoresce,
labeled molecules in the reaction mixture, then sort the B cells. Those
immortalized cells which are l~heled ge"eraLe antibody molecules which
bind the labeled molecules from the reaction mixture. These
immortalized cells can be grown to create a librar,v of monoclonal
antibodies which are the candidate enzymes. In addition, it is possible
to select antibodies, or other sequences which consLilute the further
enzymes at each iterative step ~ lded to above by using the product
molecules to create affinity columns, then using the columns to select
s~lbset~ of libraries of phage displayed antibody molecules, polysome
trapped antibodies, or libraries of DNA or RNA aptomers, or other
sequences which bind the products on the column hence which may
function as candidate enzymes. Thus, in addition to the use of a high
diversity antibody library to find candidate enzymes, it is also possible to
use other high diversity libraries. Among these, it is preferred to use

~ WO 94/24314 21 6 0 4 5 7 PCTIUS94/04314

- 31 -
high diversity RNA libraries, DNA libraries, and libraries of stochastic
peptides alone or as fusion proteins with a variety of evolved proteins.
Another pre~er,ecl means to create a set of candidate
enzymes which may help derivatize a core molecule with a set of
~dducts or other subsLrales, consists in using known enzymes involved
in the normal biosynthetic pathway leading to the core, plus mutant
varia, Its of those enzymes. Similarly, known enzymes utilizing any of the
cts as subslrales, plus mutant variants of those enzymes, may be
used. ,n order to catalyze a s~ ~ccession of reactions from the core
molecl~le and further novel s~bslrales which may react with it, it is
advanta~eo~ ~s to use the substrates and products present at each
iteration of the reaction cycle to identify the enzymes which bind
subsl.ales and/or products, then create further mutant spectra of these
identified enzymes as candidates to catalyze the next reaction steps from
the core molecule. Enzymes which bind subsl,ates and products can be
identified by means known in the art, including binding assays to cloned
enzymes via plaque or other assays. It is also advantageous to use a
set of candidate enzymes formed by the union of a set of known
enzymes and their mutant spectra, as just desc,ibed, plus a set of
can~ tes derived from a high diversity library of candidates, such as
the mouse antibody repertoire as described above.
In all the embodiments of this invention it can be
advanl~geo~s to use procedures to select subsl,aLes at each of the
stages of amplification of diversity which are good candidates to
undergo reactions which yield a desired molecule of interest. A
procedure to do so consisl~ in creating sets of "shape-complements" to
the "shape" of the desired target, then using the sets of shape-
complements to bind and affinity select candidate substrates whose own
"shapes" are similar to the target shape of the desired molecule. As a
non-li",ili"g example, if the target molecule of interest is estrogen, it may
be used to ge"era~e a set of monoclonal antibodies against estrogen, or

WO 94124314 216 0 ~ ~ 7 PCTIUS94/04314 ~


a polyclonal serum against estrogen. These antibodies can be used to
affinity purify candidate s~b~L~ s with shapes similar to estrogen.
Re~ions building upon these candidate substrates can be carried out,
and the products searched for estrogen mimics.
6 In addition to antibodies, other shape diversity libraries ofDNA, RNA, or otherwise can be used to find shape-comple",e,lts to the
target mol~cule~ here estrogen.
This "target shape" procedure can be advantageously
extended in three ways. First, among the antibody molecules binding to
estrogen ("rank one" antibodies), some will bind to the active site or the
vicinity of the active site, and others will bind to other sites. These may
be disc,i,l,inated by using the antibodies, each as a monoclonal, to
ge"erale antiidiotype antibodies ("rank two" antibodies) by means known
in the art. Any rank one antibody which generates a rank two
antiidiot,vpe antibody that competes with estrogen for the binding site on
the rank one antibody is likely to be a rank one antibody whose binding
site actually binds the active site of estrogen. The set of each such rank
one antibodies can be used to affinity select candidate subslrales with
shapes similar to estrogen.
Second, the set of second rank antigens which compete
with estrogen for binding sites on rank one antibodies can be used to
affinity select candidate enzymes which will act on estrogen-like
subslrales to yield estrogen mimics.
Third, this set of rank two antibodies can be used to
generate "rank three" antibodies which can be used to affinity select a
wide variety of estrogen-like subslrales. In addition to antibody
mo'Qcu'~s and antiidiotype antibody molecllles, other sets of shape and
shape-complement molecules, including DNA, RNA and other complex
molecules can be used. These can, as one non-limiting example, be
sele~ted from high diversity combinatorial libraries of molecules.

~ WO 94/24314 216 0 ~ ~ 7 PCT/US94/04314

- 33 -
In another embodiment of the present invention, a group of
molecules are used which contain autocatalytic sets, e.g., ~lto~t~lytic
sets of catalytic polymers. Reaction mixtures comprise such organic
molecllles which are simultaneously substrates and catalysts. Re~ctions
are carried out in a chemostat under flow conditions. For example,
molecu'es A, B and C are present wherein molecule B catalyzes its own
formation out of suL~l,ate molecule A, and molecule C catalyzes its own
formation out of sut,~l,ale molecule A. This reaction is carried out in a
chemostat where a receptor mo'ecuie, such as acetylcholine receptor is
affixed to the walls of the chemostat and can bind any molecule that
looks like acetylcholine. In this example, molecule B but not C looks
sufficiently like acetylcholine to bind to the receptor for acetylcholine that
is on the chemostat walls. Under flow conditions, the B molecule will
tend to be selectively retained within the chemostat and the C molecule
will not be retained. This provides selective conditions which leads to
the selective amplification of the B autocatalytic set compared to the C
~ ~toc~lytic set. For example, if B, even when bound to the receptor
acts as a catalyst leading to its own formation, then its retention within
the system is selectively favored, and B is amplified with respect to C.
More generally, in a complex reaction mixture in which molecule B
funcLiG,)s as a catalyst in its own formation out of the complex reaction
mixture, then retention of B is selectively favored bec~se it binds to the
receptor for acetylcholine. Thus, in general, by taking a system under
chemostat conditions in which one has a receptor for a molecule X,
where fi~n~ing analogs of X is of i"lelest (X here is for example
acetylcholine), then this is a general procedure to select among
~utoc~t~lytic sets for those sets synthesizing X-like mimics. Hence, this
selective method enhances the capacity to use random complex reaction
mixtures to s~,LI ,esi~e drug candidates able to mimic X.
In another aspect of the present invention, methods are
provided for generation of new compounds without the use of enzymes.

21604~7
WO 94/24314 - PCTIUS94/04314

- 34 -
In one embodiment, the method comprises the steps of (a) reacting a
group of cli~ere,)t subslrales, the group comprising acids, amines,
alcohols, and unsaturated compounds, under suitable conditions with a
dehy.l~ ~ti"g agent to yield a first reaction mixture; (b) reacting the first
reaction mixture with a reducing agent under suitable conditions to yield
a second reaction mixture; (c) reacting the second reaction mixture with
an oxidizing agent under suitable conditions to yield a third reaction
mixture; (d) pe"or,)1ing a cG"der,salion reaction under suitable
conditions upon the third reaction mixture to yield a fourth reaction
mixture; (e) exposing the fourth reaction mixture to light of wavelength of
about 220 nanometers to 600 nanometers, thereby producing one or
more organic molecules di~erel1L from the substrates and agents; (f)
screening the exposed fourth reaction mixture for the presence of an
organic molecule having a desired property; and (g) isolating from the
exposed fourth reaction mixture the organic molecule having the desired
property.
In anolller embodiment, the method comprises the steps
of: (a) reacting a group of di~erel ,l substrates, the group comprising
acids, amines, alcohols, and unsaturated compounds, under suitable
conditions with a dehydrating agent to yield a first reaction mixture; (b)
reacting the first reaction mixture with a reducing agent under suitable
conditions to yield a second reaction mixture; (c) reacting the second
reaction mixture with an oxidizing agent under suitable conditions to
yield a third reaction mixture; (d) performing a cGndensalion reaction
under suitable conditions upon the third reaction mixture to yield a fourth
reaction mixture; (e) exposing the fourth reaction mixture to light of
wavelength of about 220 nanometers to 600 nanometers, thereby
producing one or more organic molecules different from the substrates
and agents; (f) screening the exposed fourth reaction mixture for the
,uresence of an organic molecule having a desired property; and (9)

~ WO 94/24314 216 0 4 5 7 PCT/US94/04314

- 35 -
cleler,))ini"g the structure or functional properties characterizing the
organic molecule having the desired property.
In this aspect of the present invention, a group of dirrerenl
suL.~t,ales, such as those described above, are subjected to a series of
reaction conditions from which one or more compounds having a
.lesi~ed property are produced without the use of enzymes. More
specifically, a group of dirrere,~t subslrales are reacted under suitable
conditions with a dehyd~ling agent to yield a first reaction mixture.
Suitable dehydrating agents include carbodiimides, carbonyldiimidazole,
sulfonyl halides, pl,osge~e equivalents and activated phosphoramides,
as well as other agents in common use for solid phase peptide synthesis
and nucleotide synthesis, etc. It will be evident to those of ordinary skill
in the art that the most prefel,ed solvent(s) are dependent upon the
particular group of suL,~l,a~es selecterl For example, if all the substrates
are fairly polar in nature, a solvent such as methanol may be used.
CG"ce,lt,aled solutions of individual subsl,ales are made and then the
group of substrates prepared by mixing aliquots of each conce~llrdled
solution. Mixtures of solvents which are miscible with one another (i.e.,
do not form two phases) are appropriate where all the substrates are not
soluble in a single solvent. Examples of solvent mixtures are acetone
and water, dimethyl formamide and water, or ethanol and water.
Reaction conditions may be varied, but generally the reaction will be
performed from about one hour to over, I.ghl at a temperature from about
room temperature to the boiling point of the solvent.
The first reaction mixture, such as that described above, is
reacted under suitable conditions with a reducing agent to yield a
second reaction mixture. Suitable reducing agents include dissolving
metals, hydride reagents, molecular hydrogen with suitable metal
catalysts (e.g., platinum, palladium, nickel or rhodium), etc. Examples of
reducing metals include sodium, lithium, potassium, various amalgams,
calcium, iron, and tin. Examples of hydride reagents include sodium

WO 94/24314 216 0 4 5 7 PCT/US94/04314


borohydride, lithium aluminum hydride, and borane. Reaction conditions
may be varied, but generally the reaction will be performed from about
one hour to overnight at a temperature from about room temperature or
below (e.g., in an ice bath). It will be evident that certain reducing
agents perform best in certain solvents. For example, where hydride
reagents (such as sodium borohydride) are used, it will be evident that
non-hydroxylic solvents (such as dimethylformamide) are ~.re:ter,ed.
The second reaction mixture is reacted under suitable
conditions with an oxidizing agent to yield a third reaction mixture.
Suitable oxidizing agents include ozone, peroxides, chromate,
perma"ga"ale, osmium tetroxide, chlorine, bromine, and air in the
prese"ce of suitable metal catalysts (such as ruthenium tetroxide).
Re~tion conditions may be varied, but generally the reaction will be
performed from about 1-2 hours to over, li~ht at a temperature from
about room temperature or below (e.g., in an ice bath). It will be evident
that certain oxidizing agents function best in certain solvents. For
example, a mixture of water and alcohol may be used with hydrogen
peroxide, but water only with permanganate, and l,exane (or petroleum
ether) with halogens such as chlorine or bromine.
A CGI ,de"sation reaction is performed under suitable
conditions upon the third reaction mixture to yield a fourth reaction
mixture. The third reaction mixture may be subjected to CGI ,dens&lion
by dehydrating agents or heat. Suitable dehyd~li"g agents include
molecl ~l~r sieves, carbodiimides, ~eoLropic distillation (to remove water),
etc. For example, toluene may be added and then azeotropic distillation
performed to remove water. It will be evident that reaction conditions
vary depending upon the type of dehydration agent used.
The fourth reaction mixture is exposed to light. The light
generally is within a range of about 220 nanometers to 600 nanometers,
which ineh~es portions thereof or discrete wavelengths if desired.
Reaction conditions may be varied, but generally the irradiation of a

~ WO 94/24314 21 6 0 4 ~ 7 PCTtUS94/04314

- 37 -
reaction mixture will be performed from about 15 minutes to 2 hours at a
temperature from about room temperature or below (e.g., in an ice bath).
All the above-described reactions are generally pei~or,ned
at ambient pressure. Certain exceptions, such as reduction using
molec~ r hydrogen, will be evident. It will be readily appreci~ted by
those of o,dil,a,y skill in the art that a group of subsl,ales may be
subjected to the various reaction steps in orders which differ from the
order provided above. Further, it will be evident that after subjecting a
group of subsl,ates to any one or a subset of the various reaction steps
above in any order one or more of the steps may be repeated if desired.
Further, it will be clear that other reagents, used singly, or in mixtures, or
used sequentially, in addition to the above examples, or with the above
examples where pr~ctic~l can be utiiized. The repetition of steps need
not be in the order initially performed and ~d~iitional substrates may be
16 introd~ced at any step if desired. In addition, one or more of the
sul,slrdles used initially, or introduced at a subse~luent reaction step,
may be gel,era~ed by any of the methods provided herein, i.e., by
rando"~ chemistry with or without enzymes.
As described above, in an embodiment of this aspect of the
present invention, the group of sul~sl,ales is provided by derivatization of
an initial molecule or a class of molec~les Such a group of subslrales
is subjected to the above-described reactions without enzymes to
ge,)er~le a high product diversity which is cenlered around the initial
molecule or a class of molecl~les.
A variety of means are available which allow detection of
low concelllraLions of one or more species of a desired molecule in a
mixture of molecules generated by the methods provided herein. For
example, a variety of cell systems are well known to those of ordinary
skill in the art which allow detection of low concentration ligands, e.g.,
ligands binding a hormone receptor. In this regard, for example, a
system has been developed which clones human G peptide hormone

WO 94124314 216 0 4 ~ 7 PCT/US94/04314 ~


receptors into frog melanocytes (Lerner, Proc. Natl. Acad Sci. USA).
The hormone receptors, typically located in the cell membrane, respond
to binding of the cor,espo".l;ng hormone, but trigger a cell response
releasing or reabsorbing melanophores. In a forty minute reversible
cycle, cells darken dramatically, then can be induced to lighten in color
again. Respo"se of the cell depends upon the affinity of the hormone
for the receptor. Typical responses occur in the nanomolar to 100
picomolar hormone co,)cel,lrdliol- range. For some hormone receptor-
I,Gr",GI,e pairs, where affinity is higher, response occurs in the picomolar
I,GI,oone cGncenl~dliGn range. This cell system is an example of an
assay system which allows detection, in a mixture of molecl ~les, of one
or more species of ligands able to bind to the receptor. The set of
mo'ecl~le ligands able to bind the receptor are then the ligands of
interesl, for they are candicl~tes to act as drugs by antagonizing,
agGni~ing, sl~bstituting for, or modifying the effects of the natural
I ,Gr" ~one.
A second example of a cell assay is that available
commercially from Molecular Devices (Palo Alto, CA). It col~sisls of an
array of cl,er,lfels which respond to very small changes in local pH. In
turn, these small pH changes reflect the altered metabolic activity of a
population of cells upon receipt of some molecular signal, such as a
hGr",o~e binding its receptor. For example, cell assays in which a
hormone binds a receptor are known to those of ordinary skill in the art
and allow nanomolar or subnanomolar concentrations of the hormone
ligand to be letected A prefer,ed means of using the present invention
cGnsisl~ in exposing such cells to a high diversity library of molecules
generated by the methods provided herein, to detect the presence of
one or more species of molecules able to trigger the cell response. That
set of small molecules, each of which is highly likely to bind the
I,ol",ol,e receptor, are the molecules of interest which may serve as
drugs. Another example is to use blast B cells, which on their surface

216045 7
WO 94/24314 - PCT/US94104314

- 39 -
ex,~.ress antibodies directed to a molecule of interest, to detect in a high
diversity library the presence of moleulles which sufficiently mimic the
molecule of inlerest to be able to bind to its antibody on a B cell. Thus,
an animal is immunized with a molecule of inleresl and the early B cells
isol~ted A high diversity library of molecl~'~s generated by the methods
provided herein is screened using the population of B cells. For
example, binding may stimulate cell cycling or division by the last B cell
bound. Cell cycling or division may be detected by means known in the
art.
Allt:"~ali~/ely, a variety of assays to detect the presence of a
ligand of intelesl exist which are based on direct binding assays. Thus,
for example, a receptor for a hormone can be used directly to detect
binding of a r~dic~ctivity labeled ligand. Other means, known in the art,
to accomplish this include the following:
(i) The estrogen receptor is used as a non-limiting
example. The cloned receptor can be affixed to a flat surface, for
example, a filter. Very high specific activity estrogen is prepared, and
bound to the receptor popul~tion. This set of bound ,ece~.tor~ is then
used in a competitive assay. The bound rece~.lols are exposed to a
library of compounds generated by the methods of the present
invention. If the library contains ligands which also bind the estrogen
receptor, those ligands will compete with the radioactively labeled
estrogen itself for the receptors. Hence the r~d ~ctively labeled
estrogen will be competitively displaced from the receptor, and can
readily be ~etected by means known in the art. Thus, this assay allows
detection of one or more species of ligands in the mixture which
compete with estrogen for the estrogen receptor. This set of ligands is
the set of interest, as they are candidates to be drugs mimicking or
antagonizing estrogen.

WO 94/24314 21~ 0 ~ ~ 7 PCTIUS94/04314 ~

- 40 -
(ii) The estrogen receptor is again used as a non-
g example. By means known in the art, one raises antibody
mOIE~CI I'QS which are able to bind the receptor when the receptor is not
bound by estrogen, but not bind the receptor when occupied by
estrogen. Alle",ali~ely, one generates antibody molecules which bind
the estrogen receptor only when the receptor itself does bind estrogen.
These antibody molecules can then be decorated with reporter groups
by a variety of means known in the art, and used to detect the presence
of one or more ligand species in a librarv of high diversity, which bind to
the estrogen receptor. In the case of antibodies which only bind the
receptor if the receptor is itself unbound by estrogens, one tests for loss
of antibody binding in the ~.lesence of the library of compounds and in
the simultaneous absence of esL,oge,1. In the case of antibodies which
bind the receptor only if the receptor is bound by estrogen, one tests for
an inc~ease in binding of the antibody in the plesence of the receptor
and high diversity librarv.
(iii) In order to detect ligands in a high diversity library
which are cand;~i~tes to mimic or antagonize the action of a given
l,Gr",o"e or other molecule of interes~, it is advantageous to generate
one or more monoclonal antibodies which bind the hormone or other
molecule of inl~resl. This set of monoclonal antibodies can then be
used, rather than a receptor, for the target molecule that is to be
mimicked, in binding assays such as those noted above to detect the
presence of one or more ligand species in the reaction mixture which
are cand i~tes to mimic or antagonize the action of the target molecule.
An advantage of this procedure is that a receptor for the target molecule
need not be available. Use of a set of monoclonal antibodies is
advantageous because, a priori, it is not certain which molecular feature,
or epitope, of the target molecule mediates its biological action. Use of
a set of monoclonal antibodies, each responding to a different epitope
on the target molecule, enhances the probability that the ligands

~ WO 94124314 216 0 4 5 7 PCT/US94/04314

- 41 -
detecter~ in the high diversity library will include those which mimic the
biologically important epitope of the target. In some cases it may be
possible to selectively use only those monoclonal antibody molecl~es
which bind to the known important epitope of the target molecule.
(iv) Means are established in the art to detect protein-
p,otei.) binding based on plasmon resonance and detection of a shift in
refractive index. In a detection system developed by Pharmacia
(Piscataway, NJ), a monoclonal antibody, or a l,or"~o"e receptor, is
layered onto a gold chip. Binding of hormone, or other ligands to a
receplor, is detected in very low conce"l-dlions (e.g., in the nanogram
range or less). Thus, any receptor, or antibody, or other "shape
complement" of a target molecule of inleresL can be placed on the gold
chip, the latter can be exposed to a high diversity library, and the
,crese"ce of liganding species can be ~lelecte-l
Another example of direct measL"e")ent of ligand-binding,
which the applicant believe was developed by Evotech, can measure
ligand billdi,l53 in the fe",tol"olar range. Rudolph Rigles of the
Karolinska Institute in Stockholm has described a laser assay system in
which a laser is focl~sed on an approximately 1 cubic micron volume of
fluid, and can detect the presence of fluorescently labeled compounds at
fel"lo")olar co"ce"lr~lions, 10-15 M, in tens of seconds. By fluorescent
labeling of small "shape-complement" molecl lles of a desired target
molecule, the bil ,di,)g of a target-mimic molecule to the shape-
complement can be detected through alteration of the diffusion of the
ligand-bound versus free shape-complement molecule. Thus, if estrogen
is the target molecule, and a small RNA aptomer is the shape-
complement which binds estrogen, then fluorescent labeled versions of
that RNA aptomer can be used in Rigler's system. An estrogen-mimic
which binds the fluorescently labeled RNA will slow its diffusion as
detected in the laser system. Thus estrogen-mimics at very low, 10 15 M
or fel"~o",olar, col,cenlraLions can be detectef~.

WO 94/24314 216 0 4 S 7 PCT/US94/04314 ~

- 42 -
A further means to detect ligands of inlere~l at very low
cGnce"lldlions consists in seeking ligands which block a DNA
polymerase. By blocking the DNA polymerase chain reaction (PCR)
enzyme, amplification of the DNA can be blocked. Since PCR
amplification can yield billions or more copies of the initial DNA
sequence, blocking PCR amplification yields a readily detect~ble signal
of a ligand which blocks the polymerase. Clearly, this method
generalizes to other means to amplify DNA, RNA, or DNA- or RNA-like
molecules such as ligation amplification, and extends to general means
to block polymerases directly or indirectly with ligands of i"LeresL.
Given that the diversity of the library of molecules which
must be tested for molecules of i"leresL is related inversely to
conce,)l,~lio,)s and given that the requirement that the founding
subsl,ales must be jointly soluble in the reaction mixture, then driving
the ~etection level to very low co"cellt,alions permits the invention to be
ili~ed to explore libraries of extremely high diversities. Diversi~ies of
10~5 can be generaled, and the presence of ligands of co"cenL-dLio"s of
10-15 to 10-16 M can be both detected and generated from initial millimolar
mixtures of 1,000 to 100 subsL,dLes. Additionally, with a sufficiently high
diversity of enzymes or reaction conditions, a high diversity library may
be ~eneraled with a founder set of organic compounds with a diversity
as small as 10.
As described above, compounds of inLeres~ in the high
diversity library may act as catalysts for a desired reaction, or as
cofRc~ors with other molecules to form an active catalyst. Other
molecules may act as inhibitors of enzymes. In order to eYclude the
possibility that the enzymes or catalysts are found among the candidate
set of enzymes which may have been used to gel-eraLe the library, the
latter set of enzymes can be quanLiL~ /ely removed from the high
diversity library by aflinity columns bearing molecules directed to a
constant part of each of the set of enzymes, or other means known in

~ WO 94/24314 216 0 4 5 7 PCTIUS94/04314


the art. The resulting high diversity library itself is then assayed for
cand ~tes of i, Itel esl.
Detectionof molecules able to inhibit an enzyme may
proceed by ~etecting ligands able to bind the enzyme, as described
above. Identifying molecules which are candid~tes to catalyze a reaction
alone or as a cofactor, may proceed by testing high diversity libraries
alone, or in the prese,)ce of a helper molecule, say a protein, for which a
desired molecule will be a cofactor. The system is tested for the
presence of ligands able to bind a stable analogue of the transition state
of the reaction. Such binding molecules are the candidate catalysts or
cofactors sought, for they are candidates to catalyze the reaction itself.
All~r"ali~ely, a variety of means are known in the art which
allow ~etection of the products of a catalyzed reaction itself. For
example, cllro",ogenic or fluorogenic suL,slrates for a variety of reactions
of i"lelest are available. Catalysis of the reaction increases the rate of
forl "dliO, I of the colored or fluol esce, ll product. Aller"dLi./ely, assay
systems are available or readily prepared which detect the presence of a
product molecule bec~use that product molecule binds a receptor, an
antibody mQlecule, or other shape complement. Thus, detection of
higher rates of formation of that product molecule demo,l~lrales that the
reaction itself was catalyzed.
Following the generation of high diversity libraries of
compounds and the screening for the prese,1ce of compounds having
properties of i"leresl, such compounds of i"lerest are characterized with
2~ or without isol~tion. A variety of means, including those known in the
art, are available to characterize or isolate such compounds of interest.
Characteri,dlion and/or isolation, depend upon the
inrolllldlion desired, and can be carried out at dirreren~ mole abundances
of the target molecule of i,~lere~L. Thus, using modern mass
spe~Lroglaph analysis, about 10 15 to 10-18 moles can be assayed for
mass and charge, then fragmented in a variety of ways known in the art

WO 94/24314 216 ~ 4 ~ 7 PCT/US94/04314 ~


and the fragments assayed for mass and charge. Using this data, it is
possible to derive the structure of the molecllle of inleles~. For example,
ligands of i,ltere~,l may be isolated by binding to a given hormone
receptor, or monoclonal antibody, then ~he liganding molecu'es rele~sed
by means known in the art, and finally characterized analytically. One
means comprises attaching a target receptor or antibody to a solid
support. A reaction mixture or subset thereof is contacted with the solid
support. Those molecules that are bound will be retained, while the
non-bound molecules are readily separated from the solid support. The
molecllles of unknown structure which have been retained, are then
eluted. The freed moleulles are characterized analytically, e.g., by mass
spectroscopy, NMR, IR, UV, and may be sy"ll,esi~ed in batch quantities.
Examples of analytic techniques involving mass spectrometry include
gas chromatography-mass spectrometry (GC-MS), HPLC-mass
spectrometry (LC-MS), and field desorption mass spectrometry (FD-MS).
In other cases, ~he co"ce"l,dlions of molecules of i"le~esl
in the high diversity library will allow detection of their presence, but may
be too low for further isolation or charac;teri~lion. A prerel ,ed
procedure called "sib selection" allows ready winnowing of the set of
candidate enzymes, the set of founder subsl,ales, and the set of reaction
conditions and chemical reagents, to smaller sets. This winnowing
simultaneously reduces the side products generated in the high diversity
Iibrary, increases the concentration of the target molecule of inleresl, and
identifies the subset of candidate enzymes which catalyze the pathway
leading to s~" Ill ,esis of the target molecule, and identifies the set of
founder substrates required for synthesis of the desired target. Thus,
this sib selection procedure is a means to generate a previously
unknown molecule of il,lerest, as well as identify both that molecule and
the substrates and enzymes needed to form that molecule.
A library, where the target of interest is a molecule which
binds the estrogen receptor, is used as a non-limiting example. For

~ WO 94/24314 216 0 4 5 7 PCT/US94/04314

- 45 -
example, a high diversity library derived from D and L amino acids,
including "G"nal.lral amino acids, and small peptides which may be
composed thereof is provided by the methods described herein. Such a
library will CGI lldil I linear, branched, cyclic and other singly or multiply
col,~lrained forms due to formation of disulfide (S-S) intramolecular
bonds.
An aspect of the present invention where sub:il,dles and
candidate enzymes are used is discussed first. Further below, another
aspect of the present invention where candidate enzymes are not used,
but one or more reagents or reaction conditions are used, is disc~ssed
The presence of one or more ligands for the estrogen
receptor is rletecte~ in the high diversity library of this example by any of
the means described above, or any other means. The set of candidate
enzymes and set of founder subsl,-dles suffice to lead to reactions which
g6"6rdle the desired ligands. As a non-limiting example, a set of four
reaction steps, using seven of the initial sul)slrdles at ditterel,l reaction
steps, may lead to the desired target molecule. By winnowing down the
set of initial subsl-ales to the seven needed and the set of four enzymes
needed the target molecule may be synthesized in high concentrations.
High cG"cel)L,dLions may be achieved bec~se, given the solubility
limits, higher conce, llralioils of the seven critical sub lra~es may be
attained than when 1,000 initial subslrales were used, and because only
the four critical enzymes would be present.
Sib selection achieves this winnowing. One may start with
the candidate set of enzymes, but could equally easily start winnowing
the set of subsl~aLes. The set of candidate enzymes can be derived, for
example, from a cloned polynucleotide library. Thirty- two aliquots are
created, each of which contain a random half of the initial diversity of the
candidate enzyme library. Thus, if the initial enzyme library diversity was
1,000,000, thirty-two aliquots are created, each containing a diversity of
500,000 candidate enzymes. The chance that any aliquot has the four

WO 94/24314 216 0 4 5 7 PCT/US94/04314 ~

- 46 -
critical enzymes is theretore 1/16. Hence, on average, 2 of the 32
aliquots have the four critical enzymes. The full set of initial subslrales
are added to each aliquot, the reactions run, then each aliquot tested for
the prese"ce of the desired target molecule which binds the estrogen
receptor. One or two of the aliquots are positive. Each of these aliquots
has decreased the diversit,v of candidate enzymes by a factor of two,
from 1,000,000 to 500,000. One of the aliquots which is positive is
chosen. The other can be stored for later analysis. Again 32 aliquots
are created, each again having a random half of the remaining candidate
enzyme diversity. Hence each of the 32 aliquots now has a diversity of
250,000 candidate enzymes. Each is again tested for formation of the
target molecule which binds the estrogen receptor. Therefore, in a
logarithmic number of iterations, the set of candidate enzymes may be
winnowed down to the four needed to catalyze the synthesis of the
target molecule. In the presenl case about 18 ilelalio"s are required.
This winnowing procedure, ll,erefore, allows the isolation of
a set of enzymes needed to sy"ll,esi~e a target molecule of i"leresl.
Thereafter, mutation, recombination and selection can be used on this
set of enzymes to increase their efficiency and specificity in producing
the target molecule. Thus, this procedure yields an efficient set of
enzymes for later synthesis of the target molecule from its progenitor
sul)sl,ales. In a further use of the present invention, mutant forms of
these enzymes can be utilized to catalyze a related family of reaction
steps leading to variant forms of the target molecules. Those variants
may be more useful than the initial molecules.
In this example, the set of substrates may also be
winnowed to the seven nee~ed This winnowing can occur either before
or after the set of enzymes is winnowed. The process is the same.
Thirty-two aliquots are created, each containing a random 80% of the
1,000 initial s~lbslrdles. The chance that any aliquot cG"lai, ls the seven
critical subslraLes is .87. Thus, on average one or more of the aliquots

~ WO 94/24314 216 0 4 5 7 PCT/US94/04314

- 47 -
conlains the requisite set of 7 substrates. Each aliquot is tested for the
presence of the target molecule of inlelesL that binds the estrogen
receptor. A positive aliquot is cl ,osen. Thirty-two aliquots are again
generated each containing a random 80h of the remaining now
redlJced subslrale diversity. The aliquots are again tested for those
which contain the target molecule of inLeresl. In a loyar~ mic number of
steps it is possible again to winnow to the seven critical initial subsl,~tes.
The number of steps is modest.
It is clear that the fraction of the candidate enzymes or
initial subsl-ales used in each aliquot at the first winnowing step and
each step thereafter can be chosen such that the expected number of
aliquots which form the desired molecule is one or yrealer than one at
each step of the winnowing process.
In modes of generating a high diversity library where no
candidate enzymes are used but one or more reaction conditions and
reagents are used the set of initial substrates may be winnowed using
the sib selection procedure described above. This i"cleases the
co"cer,l-dlion of the target molecule bec~se the diversity of molecules
prese"l and resulting side reactions is sharply red~ced In addition in
advantageous cases it may be possible to winnow out those reagents or
physical conditions not needed to sy, lll ,eske the target molecule.
One aim of the sib selection procedure is to obtain a
sufficient ab-" ,da"ce of the target molecule for its characie, i~liol, and
sy"ll ,esis by independent means known in the art. Typically microgram
or milligram quantities are sufficient for such analysis by sla"dard
techniques. As noted it may often be possible to de~uce structure and
composition from far smaller quantities by mass spectrographic analysis
or other means known in the art.
It will be appreci~te-~ that it is not necess~ry to actually
isolate a compound to homogeneity from a reaction mixture where
sufficient information about the compound or its functional properties

WO 94/24314 2 ~ 6 0 4 5 7 PCT/US94/04314 ~

- 48 -
can be accumulated in its less than purified state. For exampie,
sufficient structural information may be obtainable using analytical
techniques appropriate for mixtures of compounds. Allernali~/ely, a
compound in a reaction mixture may be characterized functionally (e.g.,
6 it is defined by the set of molecules with which it is capable of
interacting). For example, a compound in a reaction mixture may
interact with a particular amino acid or small sequence of a polypeptide,
resulting in enha,)ced or dih,inisl,ed function of the polypeptide. For
example, the compound might be a suicide substrate which covalently
links to a polymer near the catalytic site. Such a bound suicide
subslrale may be used to identify catalysts with a desired activity, or to
characterize features of the active site of such a polymer. The site of
interaction on the polypeptide may be detected by analytic techniques
which are capable of ~letecting perturbations to individual amino acids or
regions of the polypeptides. This information regarding the locus for
allerdliG" of the polypeptide's function (i.e., information about the target)
may be equally or more important than the structure of the compound in
the reaction mixture which interacted with the polypeptide. It will be
evident that, based on this type of information, one may modify a
particular amino acid or region of a polypeptide in a variety of ways.
The following examples are offered by way of illustration
and not by way of limitation.

~ WO 94/24314 216 0 4 5 7 PCT/US94104314

- 49 -
EXAMPLES

EXAMPLE 1
Preparation of Ubiquitin Fusion Libraries
With Diversity of 1x107

The single-stranded DNA needed for 38, 71, and 104 amino
acid polypeptide libraries is synthesi~e~l The total diversity is on the
order 10'5. PCR amplification is carried out by routine methodology.
Ligation and l,a,)~or",ation efficiency, without attempts to o,cLi",i~e,
ligates on the order of 107 random sequences into plasmid, and after
Ira"~for",ation yields about 30,000 clones. An efficiency yielding of
about 10,000,000 to 100,000,000 l,a"stor")ants per ug of plasmid DNA is
attainable (Sambrook et al., Molecular Cloning: A Laboratory Manual, 2d
ed., Cold Spring Harbor Laboratory Press, 1989). Using 50 ng per
I,ansfor",alion, 500,000 to 5,000,000 clones per Irallstormation is
achieved. T,dns~or"lation may be opli,ni ed by (i) purifying the insert
DNA, (ii) o,~"i,oking ligation con~lilions, or (iii) ofjLil"i~i"g lransfor",ation
technique and conditions. Even at unoptimized efficiency, a polypeptide
diversity of 1,000,000 with thirty lrans~or",ations is at~ained.
On average, each sequence among the 107 ligated is
unique. The diversit,v obtained is tested by counting total Irans~ormants
created, sampling random ampicillin resistant clones, carrying out
plasmid preparations, restriction mapping and screening for inserts. This
allows calcu'~tion of the total number of transformed clones obtained,
but, since any sequence might be present in multiple copies, the total
alone does not yet specify the total diversity.
Clone redundancy in the library is tested using plasmid
preparations of a pool of 5,000 plasmids. Redundancy among these
distinct plasmids is tested via hybridization with the unique random DNA
region from each of several specific plasmids among the 5,000. To carry

WO 94/24314 21~ 015 7 PCT/IJS94/04314 ~

- 50 -
this out, 5,000 Ira,)~tor",ed colonies are grown on a single plate, lifted
onto nylon filters (GeneScreen Plus, DuPont), the cells Iysed, the DNA is
W- crosslinked to the filter, washed, the DNA denatured with NaOH, and
then neutralized~ Thereafter hybridization is carried out under slringe, ll
conditions with r~d ol~heled unique DNA probes purified from each of
several plasn,ids among the 5,000~ Probe DNA is cut from the ~ cent
uhitllJitin sequences and gel purified prior to labeling~ Probe is l~heled
by random primer labeling (Prime-lt, Stratagene Cloning Systems)~
Autoradiography of the resulting filters reveals if any insert DNA
sequence occurs in an expected one, or many among the 5,000 colony
diversity on the plate. Given the distribution of numbers of colonies
bound for each of 10 to 20 probe insert DNA sequences, the expected
diversity of the library may be c~lcul~te~l based on maximum likelihood
methods.

EXAMPLE 2
Ge,-erali"g A Diversity of Product Molecules

The combi"alorics of the libraries described in Example 1
are tested for the onset of catalyzed reactions as libraries of polymers
act on one anothen The number of possible interactions is enormous~
For example, for ligation reactions involving two DNA substrates and one
polypeptide catalyst, the combinatorics admit of 1021 possibilities of
interactions in the DNA and peptide libraries of a 10,000,000 diversity.
Even where the probability that an arbitrary polypeptide catalyzes a
given ligation reaction is 104 (an estimate based on the ease of finding
catalytic antibodies), a very large number of distinct reactions are
catalyzed. Although the combinatorics favor the onset of catalyzed
reactions, as the diversity of reactants increases, the conce"Lralion of
any type of sequence decreases proportionally. For bimolecular

2~0457
WO 94/24314 - PCT/US94/04314

- 51 -
reactions, the forward rate decreases as the square, for trimolecular
reactions the rate decreases as the cube of the falling co"ce"l,dlions.
Using an estimate of the probability of catalysis of 10~ and
seeking two sub~l,ale reactions Such as ligation, transesterification, or
transamination to score, the desired product co, Icentr~liolls and
catalyzed reactions may be achieved with diversities of 10~ in both the
DNA and polypeptide libraries. For unimolQclJI~r reactions such as
cleavage, or phosphorylation, diversities on the order of 105 to 10~ in
both the substrate and catalyst library are needed.
A first set of experiments utilizes single stranded DNA
sequences as substrates. S~bse~uent experiments use polypeptides as
sub~lrales. This choice is made for three reasons. First, prod~ction of
novel DNA sequences, whose length differs from the initial set of
sl,bslrales, all of identical length, is easy to detect on sequencing gels.
Second, single slrar,ded DNA, like RNA, is able to fold into complex
structures (Lu et al., J. Mol. Biol. 223:781-789, 1991), hence afford a
wider variety of sites for binding and catalysis than double stranded DNA
sequences of the same total diversity. Third, single stranded DNA is
easier to obtain from the libraries than the corresponding RNA, and
somewhat more stable against degradation. Nevertheless, RNA of high
diversity specified by the libraries may be purified. Aller"dli~/ely, DNA
sequences may be modified to include RNA polymerase premier sites
such as T7 to allow in vitro RNA transcription (Ellington and S~ost~k,
Nature 355:850-852, 1992), and obtain high diversity RNA libraries for
use as subsLrales. Thus, protocols are stated in terms of single stranded
DNA subslrales, but single stranded RNA libraries may also be used.
The plastein reaction (Wang et al., Biochem. Biophys. Res.
Commun. 57:865, 1974; Silver and James, Biochemistry20:3177, 1981)
is a general model for the experiments. In this reaction, protein
subslrales are incub~ted with trypsin, which cleaves the subslrdles to
smaller peptides. Since any enzyme catalyzes forward and reverse

WO 94/W14 21 6 0 ~ 5 7 PCTIUS94/04314 ~

- 52 -
reactions, trypsin is capable of catalyzing ligation of larger polypeptides
from the smaller peptide fragments. It has been found that dehydrating
the reaction mixture, to shift the equilibrium in favor of synthesis, suflices
for trypsin to catalyze ligation and transamination reactions leading to
formation of high molecular weight polypeptides in the absence of ATP
hydrolysis (Levin et al., Biochem. J. 63:308, 1956; Neumann et al.,
Biochemistly 73:33, 1959). If the high molecular weight ~,a~ ial is
removed and the reactants again cGncentraled, further high molecular
weight polypeptides are formed. Absence of a requirement for ATP
hydrolysis is not too su".lising, since transamination reactions can
,~,,oceed without net formation of new peptide bonds.
In the first set of experiments, single stranded DNA
sequences of constant length from the libraries, end labeled after the
reaction in one set of experiments, and uniformly labeled prior to the
reaction in another set of experiments, are incub~ted with 32p
nucleotides. The suL,sl,~tes are then incubated with affinity purified
polypeptides from the libraries of Example 1 with length 38, 71, and 104
of tuned diversities in the ranges noted. Divalent cations, such as
Mg++, Pb++, Mn++, as well as ATP as a potential energy source may
be included In addition, c~nce,llralions of DNA sub~l,ales and
polypeptides are tuned over a range sufliciently broad to include
conditions under which biological polynucleotkles are cleaved or are
ligated in vitro. In ligation reactions, typical DNA"ends" cGI~cenllalions
are nanomolar. In a variety of reactions, typical enzyme concentrations
are micromolar or higher. DNA subsl,a~e ranges in the nanomolar
cGncer,lralions are easily created under the present experimental
conditions. For polypeptides from the 71 amino acid library, a diversity
of 10,000 polypeptides at 1.0 mg/ml yields about a 10.0 nanomolar
concentration for each fusion protein. Therefore, reactions catalyzed
efliciently by such novel enzymes, produce product 100 times slower
than were their concentrations higher. In typical DNA cleavage or

~ WO 94/24314 216 0 4 S 7 PCT/US94/04314


ligation reactions, subst~ntial product can be ~etec~ed after times on the
order of minutes. Thus, in general, detect~hle products are seen on the
order of h~"~JIeds of minutes to thousand of minutes.
The polypeptides in the library catalyze, for example,
cleavage, ligation or transesterification reactions among the single
sl,a"ded DNA target molecllles Of these, cleavage is energetically
favored in an aqueous medium, while tra~ses~eritication reactions, like
L,ansa"-i.,dlion reactions among polypeptides in the plastein reaction,
are nearly co"slanl energetically in ~q~eo~ls media. In ~ddition, a
variety of c,osslinking reactions between two single stranded substrates
may occur. Transesterification reactions between two substrate
sequences of length L can yield two product molecules, one of which is
larger than either of the two subsl,ate sequences. The beginning library
of DNA molecules are all of the identical length. Thus, on a large 38 cm
polyacrylamide gel (BRL Sequencing Apparatus) run under denaturing
cGndilions, the entire library runs as a single band. However, where th
polypeptides catalyze cleavage, ligation, Iral,sesleritication or
crossli"l~ing reactions with DNA molecule subslrates, new shGI ler or
longer DNA sequences appear on the gel. Using standard DNA
sequencingon long gels, bands which differ by a single nucleotide can
be disc,i",i"dled over about a 400 base range. The gel is run to adjust
the position of the random library full length single stranded DNA
sequences at a desired posilion on the gel. Using aliquots of the same
reaction mixture sampled at the same moment, and running gels for
dirterenl durations, a large range of molecular weights are scanned for
novel bands. As noted, all products of reactions in one set of
experiments are end labeled, since uniform labeling of substrate
sequences prior to reaction with 32p may induce radiation breaks in
single stranded subsl~ales. The end labeled material should be stable,
but less label is present on the gel, rendering detection more difficult,
and only one fragment of a cleavage reaction is visible. Uniform labeling

WO 94/24314 21~ ~ ~ S 7 PCT/US94/04314


achieves higher specific activity and legitimately marks reactions yielding
product molecules which are larger than our single stranded subslrales.
In order to assure that the new molecular size c~sses
~,rese"t de novo catalysis due to the polypeptide library, control
reactions are carried out using a control Dbrary encoding ubiquitin alone.
If affinity purified ubiquitin alone, derived from the control library,
catalyzes reactions among the DNA subsllates, then this can be
controlled for in two ways. First, novel random peptides are cleared free
from ubiquitin as noted above, the novel peptide fragments repurified by
size under non-denaturing conditions, and retested for catalysis using
these random peptides freed of ubiquitin. Second, the particular
reaction subslrales acted on by ubiquitin or cell background material can
be identified by a logarithmic dilution technique, as described below,
and eliminated from the DNA subsllale library.
A number of features of this system may be ~ssessed
First, the probability that a pr~tein catalyzes a delectable reaction on
DNA subslrales may be estimated. At low diversities of the libraries, the
appearance of a few distinct bands of lower or higher molecular weight
than the initial DNA sul)sl,ate library may be seen. Where these are the
only reactions catalyzed, then as the incubation period increases, no
further bands appear. Each cleavage reaction involving a single DNA
sul~t,dle may give rise to two product sequences. Transesterification
reactions between two subsllales again give rise to two product
sequences per reaction. Crosslinking and ligation reactions yield one
new product sequence. Single crosslinking and end ligation reactions
yield one new product sequence. Single crosslinking and end ligation
reactions among a uniform set of single stranded DNA sequences length
L should all have a total length of 2L nucleotides. There~ore, for new
bands corresponding to lengths less than 2L, the number of reactions is
estimated as half the number of such new bands. Using this data, one
may estimate the probability that an arbitrary polypeptide catalyzes a

21604~7
WO 94/24314 PCT/US94/04314

- 55 -
detect~hle reaction. (Some c~osslinked DNA sequences with 2L
nucleotides may have aberrant migration characteristics, perhaps leading
to erroneously count them as products of transesterifc~tion reactions.
This could cause a two-fold error in the estimated probability.) Second,
this estimated probability may be co"rir,l~ed by increasing the substrate
and polypeptide diversity. Third, by tuning polymer length at co, Istant
diversity, the effective number of sul)sL,ale sites and of catalytic sites
may be measured as a function of polymer length.
In an additional set of experiments to test whether the set
of polypeptides catalyze reactions, unlabeled single or doubled stranded
DNA sequences of constant length derived from the libraries is
incuh~ted with 32p labeled nucleotides or short oligonucleotides,
acrylamide gels run, and the labeled material is tested for incorporation
into large molecular weight DNA matelial.
A new ge"eral loyarill,r"ic dilution" procedures is carried
out to isolate both the specific polypeptide(s) catalyzing any specific
reaction, and the specific subsl-ales involved. The procedure introduced
here also serves to isolate both the specific set of subsllales and the
specific set of novel enzymes leading to the synthesis of a target
molecule of inleresl.
To carry out this procedure, divide the total diversity of the
initial cloned polypeptide library into four ditrerelll aliquots, each
conlain;. ,9 a random half of the total diversity of the polypeptide library.
Aliquots may be created which reduce total diversity by random halves
by knowing the diversity of the library, and the number of copies of each
sequence by methods known to those skilled in the art.
For reactions with two suL,sllales and one enzyme, the
probability that any random half of the diversity of the polypeptide librarv
has the requisite enzymatic polypeptide is 0.5. Thus, two of the set of
four random half-library aliquots contains the required polypeptide. If no
random halved aliquot had the required polypeptide, a larger number of

WO 94/24314 21~ 0 4~ ~ PCT/US94/04314 ~

- 56 -
haived aliquots is tested. Each new diminished library is incubated with
the full set of single stranded DNA substrates, and the products
analyzed on a long sequencing-type gel. On average, for two such gels,
the desired product of the reaction continues to be presenl. Thus, the
corresponding half polypeptide library contains the polypeptide which
catalyzes the reaction. That now di"~i~ ,ished library is again divided into
four random halves in four ~liquot~ Each is incl Ib~ted with the full set of
DNA substrates, the gel run and the product identified if formed in at
least one of the four aliquots. By a logarithmic number of halvings of the
initial polypeptide library, the single polypeptide catalyzing a specific
reaction is isolated. Simultaneously, the fusion gene encoding this
polypeptide is isol?ted. Thus, if the polypeptide diversity is on the order
of 10,000, then about 13 halvings suffice.
In the same way, the specific substrates for the reaction in
~luestion are ol,taine.l. For two sul,sl~ate reactions, eight random halves
of the DNA su~sl,ate library are prog,essively formed. The probability
that any aliquot cGntaills the two sul~sl,dtes is 0.25, hence on average
two of the eight have the two subsl,ales. These aliquots with the now
known catalytic polypeptide are incuh~ted, gels run, which aliquot
exhibits the desired reaction product CGI l~inlled, thereby concluding that
the cor,esponding half of the subslr~le diversity co"lains the desired two
suL,s1,a1es. Over a logarithmic number of successive rounds, the two
subs1,ales are thus isolated.
As noted, a main virtue of this approach is that it is
possible to carry it out for any set of molecule subs1,a1es, and any set of
polypeptide, RNA, or other potential catalysts. In short, where a diversity
of new products are formed under these experimental conditions, and
where one such product is of inLeresl and can be reliably found in the
product mixture after reaction, then a modest number of halving steps
isol~tes both the subsL,ates for and enzymes for the reaction leading to
the product. This approach generalizes to cases in which several

~ WO 94/24314 21~ 0 4 ~ 7 PCT/US94/04314

- 57-
enzymes carry out a sl~Gcession of reactions from an initial set of
su6~lrales. It is merely "ecess~y to alter the ra,)cJo,n fraction of the
diversity in each aliquot, and number of aliquots at each step, to assure
that at least one such aliquot colllaills the requisite set of subslraLes or
enzymes. At any diversity, a logarithmic number of steps is required to
isolate both the set of substrates and the set of enzymes leading to
sy,lthesis of a desired novel target compound.
The polypeptide libraries of tuned diversity may be
permitted to act on themselves as substrates. Many of the same
considerations apply to polypeptide and DNA sequences as substrates
for reactions. Cleavage is energetically favored in aqueous medium,
while Ird,)sdl,lination reactions are energetically neutral. Thus, as noted,
in the plastein lea~Lion, i"cleasi"g the col,cellLrdlion of the peptide
fragments by dehyd~dliol1 shifts the llansal,lination reactions in favor of
syl,lhesis of large mo'ec~ r weight polypeptides, and the reactions
proceed without ATP hydrolysis (Neumann et al., Biochemistry 73:33,
1959). Thus, after incubation of a set of labeled polypeptides of a
COI)S~dlll length and mean molecular weight, formation of novel lower
and higher mo'~cu~r weight sequences may be seen. A variety of
endoprote~ses, exoproteases and other enzymes may be used to drive
the efficient sy, lll ,esis of larger polypeptides from smaller peptide
substrates. Enzymes used include subtilisin, papain, thermolysin,
chemotrypsin, and carboxypeptidase Y, in enzyme conce-,lr~lions
ranging from micromolar to millimolar, and substrate concenlldLions
.anging from millimolar to molar (Wong and Wang, E:xperientia 47:1123-
1129, 1991).
Based upon a solubility of 1.0 mg/ml for the polypeptide
fusion library, then at a diversity of 100, each 71 amino acid fusion
peptide is present at approximately 0.6 micromolar concel,L,dLion. With
a diversity of 1,000,000, each is present at 0.06 "a"ol"olar
co"ce,lLrdLion. In a volume of 10 ml, a diversity of 1,000,000

WO 94/24314 216 0 ~ ~ 7 PCT/US94/04314

- 58 -
cor,es~onds to 10 nanograms of each. These conce"LrdLions are
~letect~hle. For example, gold stained blots on Immobilon P filters can
detect spots with 3.5 nanograms, and polyacrylamide gel staining can
detect bands or spots of 2.0 nanograms (Pluskal et al, Bio/Techniques
4(3):272-282, 1986; Ausubel et al., eds., Current Protocols in Molecular
Biolo~y, Greene Publishing and Wiley^ln~r~cience, New York, 198n-
R~d el~heling increases tlete~t~hility by more than an order of
magnitude (~iarrells, Methods Enzymol. 254:7961-7977, 1979). In order
to maxil,lke subsL,ale, hence product conce"L,dLions, the diversity and
conce,lt,dLion of the polypeptide library may be tuned to find that
minimum diversity and maximum concelltrdLion at which preferred new
prominent bands appear. In addition ~o running one-dimensional SDS
polyacr,vlamide gels, reaction mixtures are analyzed on two-dimensional
gels, running first an isoelectric dimension, followed by SDS page
analysis (O'Farrel, J. Biol. Chem. 2~0:4007~021, 1975; Garrells, Methods
Enymol. 254:7961-7977, 1979; Summers and Kauffman, Developmental
Biology 113:49-63, 1986). Automated facilities for .liyiLi~ed gel data
analysis are available. Two-dimensional gels may be used to collfirll,
that unique bands on one-climensional gels cor,espond to unique spots
in two dimension, hence a single product polypeptide. This allows one
to count the number of reaction products.
For subcritical reaction systems of minimal diversity, only a
few novel products are formed, and no further catalyzed reactions occur
due to these new polymers. Thus, as incl~b~tion increases, no new
bands or spots are generated. From the number of novel polypeptides
produced, the probability that an arbitrary polypeptide catalyzes a
reaction may be quantified. As above, cleavage and transamination
reactions among polypeptide substrates length L t,vpically yield two
products of length less than 2L. Ligation and crosslinking reactions yield
one product with a total of 2L amino acids. Using two-dimensional gels,
the number of distinct products of molecular weights corresponding to a

WO 94/24314 216 0 ~ 5 ~ PCT/US94104314

- 59 -
total of 2L amino acids are discli~"inated, since one knows an expected
mean molecular weight and a c~lc~ hle variance. Thus, for a modest
number of novel bands and spots, the total number of reactions
catalyzed may be estimated. From this, the probability that a
polypeptide catalyzes a reaction can be salcu'~ted As the lellyllls of
the polypeptides are altered, one may obtain measures of the scaling
relation for numbers of types of reactions catalyzed as a function of
polymer length of substrates and enzymes.
As noted above, phase transitions afford the ability to
catalyze an explosion of molecule diversity from a diverse founder set of
organic molecules acted upon by a sufficient diversity of potential
catalytic polymers. Where target small moleul4s of i"leres~ are
.Jetecie-J among the products of the catalyzed reactions, the logarithmic
partitioning procedures above should allow the recovery of the specific
sul)sl,ales and novel enzymes leading to the molecule of interest.
In supracritical reaction systems, by definition, new
products become substrates for yet further reactions engendering still
further new products which again are candidate sul~sllales. Three
signatures are mol,ilored to establish supracritical behavior. First, over
time, the diversity of substrate and product species increases. This is
the major criterion. Second, over time, the maximum molecular weight
product inc~eases. Third, the mean and variance in the molecular weight
distribution among the products increases in a c~iclJ~hle way.
The second and third signatures require elaboration. In a
suprac,ilical reaction system where the initial substrate single stranded
DNA, polymers are all of length L, the maximum length polymer which
can be formed by a single ligation reaction is of length 2L. The
maximum length which can be formed by use of two such newly formed
polymers in a new ligation reaction where they are the substrates is 4L,
then 8L and so forth. Thus, visu~ tion of an increasing maximum
molecular weight among the product molecules is evidence favoring

WO 94/24.314 ~ 1 fi ~o ~ ~ ~ PCT/US94/04314

- 60 -
supracritical behavior of the reaction system. More generally, in model
reaction systems whose founder subslrale sets are only a few monomers
in length, the mean and variance in molecular weights among the
product polymers increase over time and gives rise to a characteristic
unimodal distribution. The diversity of polymers of a given length
presenl in the system can be plotted on the ordi"ale and the lengllls of
those polymers on the absc;ssa. As reactions proceed creating a
diversity of small and large products, the resulting curve may rise steeply
to a peak as length increases, then fall off wi~h an exponential tail.
In the first set of experiments, the diversity of new bands
which appear on sequencing gels are analyzed as a function of time and
as a function of the diversity of the polypeptide library catalyzing the
reactions. In minimally diverse DNA su,`Jsl,dle systems a modest
number of new products may appear early, then not increase over time.
In systems with a subs~nlially higher diversity of single stranded DNA
subslrdle sequences, detection of a sustained increase in total diversity
over time (as limited by the product co"cerlt,alions required for
detection) and detection of a sustained increase in the highest molecular
weight cl~sses seen, are strong evidence for supracritical behavior of the
reaction system.
In a second set of experiments, forward reaction velocities
are driven, and the reaction system maintained in non-equilibrium
conditions, by suslai,1i,)9 the concenLralion of the founder set of single
stranded DNA sequences through periodic or continuous addition of
labeled single stranded DNA sequences for",i"sa that set. Sustained
non-equilibrium conditions through "driving" by addition of founder
SUbSllale molec~lles may be important to achieve high conce"lralions of
high molecular weight polymers. The catalyzed reactions funnel
monomers to specific large polymers.
Addition of founder substrate DNA polymers is carried out
in two ways. In the first way, slJL)slrales are added to an otherwise

~ WO 94/24314 ` 216 015 7 PCTIUS94/04314

- 61 -
closed stirred reactor. In the second way, substrates are added to a
flow chemostat. The two environments are quite dir~erenl. In a closed
stirred reactor, product molecules are not removed from the system
except by back reactions or further reactions in which they are
subslrates. In a flow chemostat, product molecl ~les are removed. As
shown in detail by Eigen and Schuster (The Hypercycle: A Principle of
Natural Self-O~a~ dlion, Springer-Verlag, New York, 1979), the
chemostat system driven by continuous addition of subsl,ale molecllles
is an environment which carries out selection on the reaction products:
The total mass of subsl,ate nucleotides ultimately becomes constant.
The fraction of these which are organized into product molecules of
dirrere,)t sizes may change. Those product molecules which are
pro~uce~ faster than they are diluted by the outflow actually accumulate
in cGncelllrdliG", the re")ai"der are gradually eliminated. Thus, the
closed reaction system allows one to test for the total increase in
product diversity over time. The flow chemostat environment allows one
to test, as a function of flow and driving rates, whether the reaction
system settles down to a sustained set of founder polymers and their
direct and indirect reaction prod~ ~ctC.
Parallel experiments are carried out in which both the
subsl,ales and the catalysts are polypeptides. To do so, one may again
begin with the minimal diversity 71 or 104 amino acid polypeptide
libraries required to see the onset of catalysis of new molecular size
prod~cts then tune diversity upward several orders of magnitude.
Minimally complex polypeptide systems can form a small number of
novel product polymers which does not increase further over incubation
time. A supracritical system shows an increasing diversity over time.
One-dimensional and two-dimensional gel electrophoresis
are used to analyze the total increase in diversity over time. Unlike
analysis of DNA sequences, however, use of two-dimensional gels may
allow one to discriminate several novel product molecules with the same

WO 94/24314 21~ 7 PCT/US94/04314

- 62 -
molecular weight on SDS page analysis. A sustained increase in total
diversity over time (as limited by the product concenlraliGns ~letect~hle),
and a sustained increase in the highest molecular weight cl~ses seen,
is strong evidence for supracritical behavior of the reaction system.
In a second set of experiments, labeled amino acids and
short peptides, up to hexamers, are incllb~ted with libraries of increasing
diversity from the larger amino acid library plus the polypeptide library.
By one- and two-dimensional gel analysis, the labeled amino acids and
small peptides are tested for incorporation into high molecul~r weight
material. Control experiments use affinity purified ubiquitin alone with
the labeled amino acids and small peptides, and the labeled amino
acids and small peptides incub~terl by themselves.
Supracritical behavior may be dem~"slrated in a
particularly clean way: Theolelical work shows that a sufficiently low
diversity founder set of amino acids and small peptides will be
subc,ilical. However, if the c~nce"l,dlions of members of that founder
set are mainlai. ,ed by exogenous addition, and the set is inc~ ~b~ted with
a high diversity of larger polypeptides added once only at the outset of
the experiment, then the larger polypeptides can catalyze the formation
of many polypeptides built up out of the founder set. Those novel
polypeptides themselves come to play catalytic roles in sustaining the
~r",dlion of themselves and yet further novel polypeptides. Indeed,
such a system might include collectively ~ ~oc~t~lytic sets of
polypeplides. In short, the small peptides alone, in sustained
cG,-ce"l,alions, are subcritical, but transient exposure to a high diversity
of larger polypeptides triggers supracritical behavior which is thereafter
sustained without further addition of the larger polypeptides.
To carry out this experiment, the above flow chemostat
experiments are extended using labeled amino acids and small peptides,
incl~ ted with an initial set of diverse 71 or 104 amino acid
polypeptides. The conce, Itr~lions of the founder set of labeled amino

WO 94124314 2 ~ 6 0 4 ~ 7 PCT/US94/04314


acids and small peptides is sustained. At a critical diversity of 71 or 104
amino acid polypeptides not only incorporation of amino acids and
small peptides into high molecular weight material is seen but
p~rsistence of that incorporation under the che",osLal conditions which
leads to the exponential dilution and ultimate loss of all initial 71 or 104
amino acid polypeptides. Such sustained synthesis of large polymers
from the sustained founder set demG"~t,ales that transient incuh~tion
with the high diversity library of 71 or 104 amino acid polypeptides
triggers a phase transition in the system of amino acids and small
1 0 peptides.
In order to conti"n that exposure of a collection of organic
mo!eclJ!es to a diversity of polypeptides leads to synthesis of an
i"c,easinJ diversity of organic molecules a reliable means of letecting
and d;sc,i"~i"aLin~ small quantities of organic molecules is required.
HPLC analysis appears to fulfill the requirements. With UV absorbance
detection HPLC can detect conc~nL,aLions down to the nanomolar
range. For example, tryptophan can be ~letected down to about 10
~anomolar. It may be possible to increase the range of small molecllles
which are r~etect~ble using IR rather than UV spectra (Kemp and
Vellaccio Organic Chemistry, Worth Publishers Inc. 1980). A chosen
set of fifty to a few hundred organic moleu ~les gives rise to a discrete
set of peaks which can be disc,i",i"aLt:d from a far more complex
mixture co"taining a number of additional peaks due to the p,esence of
new product molecules Evidence of reactions include both the
appearance of new peaks and the disappearance of the initial subsL,dLe
peaks.
In these experiments sets of founder organic molecules are
first assembled with well-displaced peaks on HPLC analysis followed by
sequential addition of trial substrate compounds to solutions containing
previously accepted members of the founder set. Founder sets are

WO 94124314 216 0 ~ ~ 7 PCT/US94/04314 ~


created which optimize both founder conce"l,dlions and diversity, such
that novel product molec~ ~es yield easily detect~hle peaks.
As in the other experiments described above, experiments
are carried out with a fixed input of founder organic molecl ~les, and
under conditions which drive forward synthesis and hold the system
displaced from equilibrium by continuous addition of the founder set of
organic molecules to otherwise closed stirred reaction systems. In a
subset of experiments, radioactively labeled founder set molecl ~les are
used to establish that radioactive atoms are incorporated into new
product molec~'e~. The conce"l,alions of product molecl~les ultimately
depends upon the ratio of the diversity of founder set to product set, the
number of reaction steps from the founder set to a given product
molecule, and the detailed forward and reverse kinetics along the
reaction pathway(s) leading to and from the product species. On
average, however, if the founder set diversity is 100 and the set
members are present in millimolar CGI ,ce"lralion initially, if the system
were otherwise closed and if the final diversi~y were about ten million,
then the terminal product concenlralions might be about 10 nanomolar.
Once having established the conditions under which only a
few reactions are catalyzed and thus in which product peaks are easily
Jetected the foundation is provided by which to increase the diversity of
the polypeptides to which the same founder set is exposed. For a
sufficient diversity of polypeptides, a very large increase in the diversity
of small organic product molecl ~es, hence peaks, is seen in the system.
As in our analysis of systems using DNA or polypeptides of fixed initial
length, here too, as reactions proceed, ever larger molecular weight
products can be formed. Thus, in supracritical systems, both diversity
and maximum molecular weight increase with time and with the diversity
of the polypeptide library.
These experiments demon~lrale that a large diversity of
organic compounds can be formed by catalyzing reactions from a

~ WO 94/24314 216 0 ~ 5 7 PCT/US94/04314

- 65 -
sustained founder set of small organic molecules. Thus, these
experi",e,)l~ lead to the application of these new technologies to the
generation of high diversity libraries of small molecules as drug
cand ~tes
Once a diversity of novel organic products is gei,eraled, the
logarithmically iterative procedure defined above may be utilized to
isolate both the set of novel enzymes leading to a specific product
molecule, and the set of founder organic molecules which are the initial
sub:,l.ales needed for the chain of reactions leading to the product
molecule. This procedure is a minor modification of that described
above and reflects the fact that several, e.g., 4, enzymes might be
needed to catalyze a chain of reactions, and reflects the fact that several,
e.g., 7, initial subsl,ates may be required in those reactions. The four
enzymes may be IGgariLl"nically isolated as follows. At each step, the
current polypeptide library diversity is lando",ly partitioned into ten
aliquots each col)Laini"~a a rancJo", 0.7 of the total diversity. The
probability that any aliquot contains the four reql~isitive polypeptides is
.24, hence on average two of the aliquots have the four enzymes.
Reactions with the full diversity of initial substrates are carried out and
the target of i"leres~ identified in one or two aliquots, thereby reducing
the polypeptide library diversity by a factor 0.7. Successive cycles will,
again in a logarithmic number of steps, isolate the four enzymes needed.
To cut the subsL-ate diversity down to the seven substrates needed the
subslrale diversity is randomly assigned to 10 aliquots each collLaill;,)g a
random 0.8 of the initial diversity. The probability that any aliquot has
the seven critical subslrales is .21, thus on average two aliquots are
s~ ~Gcessf~
This analysis is of considerable interest for two reasons.
First, it establishes that a sequence of reactions, not just a single
reaction, is catalyzed by a set of novel enzymes, leading from a set of
initial substrates in the founder collection to a target molecule many

WO 94/24314 216 0 ~ 5 7 PCT/US94/04314 ~

- 66 -
synthetic steps away. Second, such a procedure co"slilutes a radically
new approach to the problems of organic sy, Ill ,esis. Here diversity and
sc.eenir,L~ procedures are used to identify simultaneously not only
de novo enzymes, but also the set of subslrdles leading via a sequence
of catalyzed reactions to a target organic compound. The second
eresl, of course, relates to drug discovery.
There are several all~r"ali./e approaches to finding such
drug can~id~tes. In a first, a receptor for the normal ago, lisl is already
in hand and is used to screen for small molecule mimics of the agonist.
In a second, no receptor is yet available, but only the agonist itself. In a
third, inhibitors of an enzyme are sought. As an example of the first
approach, one might wish to detect the presence of an organic molecule
of i"tele~ ,resellt in nanomolar co"cel,l-alion, bec~use it binds to a
specific cloned cell receptor. Such detection is attainable by a
competition assay with the normal ligand for the cloned receptor.
Labeled normal ligand would not bind or would show redl ~ced binding in
the presence of the entirely unknown small molecule present in the
reaction mixture. As ~isu~ssed below, nanomolar conce"~rations suffice
for detection. Where a binding event is detected when the unknown
product is in the nanomolar range, then the above described logarithmic
dilution process may be used to find both the enzymes and substrates
leading to synll ,esis of a new organic molecule able to bind a cell
receptor. Note that neither the target molecule, nor the spe~cific initial
s~bstrdles, nor the enzymes required for sy,ltl,esis of the target from the
founder set of subsL,ales, need to be known in advance. Any such
molecule is a drug candidate to bind to the receptor, hence modify or
mimic or antagonize the activity of the normal agonist.
In the second approach, the receptor for the agonist is not
known, but the agonist is known. Here a set of random polypeptides
which bind to the agonist, hence are its shape complements, is sought.
This set of polypeptides then can be used, in place of the unknown

~ WO 94124314 216 0 4 5 7 PCT/US94/04314

- 67 -
receptor, to screen for novel organic molecules which compete with the
agonist for binding to members of the set of shape complement
polypeptides. While one would not yet know which polypeptides bound
the agonist by groups of atoms which reflected the function of the
agGnisl, some among the polypeptides presumably do bind the
impG,~nt agonist epitopes. Thus, the set of organic molecl~ies binding
to the polypeptide set is a set of candidate drugs to mimic or mod~ te
the activity of the agonist.
A third approach seeks a novel small molecule inhibitor of
an enzyme such as HIV plotease by slowing cleavage of the peptide
sul,sl,ale.
To seek agonist mimetics of estrogen, for example, the
cloned estrogen l~eceptor which is immobilized on Immobilon P filters as
dot blot arrays is utili~ed. Competition assays are carried out with
r~r~ c~ctively l~beler~ estrogen and the molecules formed in the reaction
mixtures. Dot blot filters are incub~ted with decreasing CGI Icel ltrdLions of
labeled estrogen and constant cGncenlralions of the mixture of organic
mclecu'es Control filters have no organic molecules ~ e-l As
estrogen concentlaliG,) decreases, tests are condlJcted to cJeLer,nine
whether competitive displacement of the labeled estrogen occurs.
Tritium labeled estrogen and its analogues are available as 150 Ci per
millimole. Thus, a picomole of this probe is 0.15 microcuries. '2~'1 labeled
estroye" and its analogues labeled at over 2200 Ci per millimole are
available. A picomole is 2.2 microcuries. Thus, even less than picomole
quantities of organic molecule competitors which displace such bound
labeled estrogen are ~etect~ble. Since novel products in the 100 to
1000 picomolar range are generated, even estrogen mimics with modest
affinity for the receptor displace labeled estrogen present in picomole
cG"celltlalion, and thus are detect~hle.
From the foregoing, it will be appreci~te~i that, although
specific embodiments of the invention have been described herein for

WO 94/24314 216 0 4 ~ 7 PCTtUS94tO4314 ~

- 68 -
purposes of illustration, various modifications may be made without
deviating from the spirit and scope of the invention.

Representative Drawing

Sorry, the representative drawing for patent document number 2160457 was not found.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date Unavailable
(86) PCT Filing Date 1994-04-19
(87) PCT Publication Date 1994-10-27
(85) National Entry 1995-10-12
Examination Requested 1995-10-12
Dead Application 2001-04-19

Abandonment History

Abandonment Date Reason Reinstatement Date
1999-04-19 FAILURE TO PAY APPLICATION MAINTENANCE FEE 1999-08-06
2000-04-19 FAILURE TO PAY APPLICATION MAINTENANCE FEE

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee $0.00 1995-10-12
Maintenance Fee - Application - New Act 2 1996-04-19 $50.00 1996-04-19
Registration of a document - section 124 $0.00 1996-10-17
Maintenance Fee - Application - New Act 3 1997-04-21 $50.00 1997-04-14
Maintenance Fee - Application - New Act 4 1998-04-20 $50.00 1998-04-07
Reinstatement: Failure to Pay Application Maintenance Fees $200.00 1999-08-06
Maintenance Fee - Application - New Act 5 1999-04-19 $75.00 1999-08-06
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
KAUFFMAN, STUART A.
Past Owners on Record
REBEK, JULIUS, JR.
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Description 1994-10-27 68 3,496
Cover Page 1996-03-07 1 17
Abstract 1994-10-27 1 48
Claims 1994-10-27 13 471
International Preliminary Examination Report 1995-10-12 7 270
Office Letter 1995-11-22 1 20
Prosecution Correspondence 1997-11-27 5 175
Examiner Requisition 1997-05-27 2 112
Examiner Requisition 2000-02-22 2 86
Fees 1997-04-14 1 206
Fees 1996-04-19 1 92