Language selection

Search

Patent 2811403 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 2811403
(54) English Title: BIOPRODUCTION OF AROMATIC CHEMICALS FROM LIGNIN-DERIVED COMPOUNDS
(54) French Title: PRODUCTION BIOLOGIQUE DE PRODUITS CHIMIQUES AROMATIQUES A PARTIR DE COMPOSES DERIVES DE LIGNINE
Status: Deemed Abandoned and Beyond the Period of Reinstatement - Pending Response to Notice of Disregarded Communication
Bibliographic Data
(51) International Patent Classification (IPC):
  • C7K 14/21 (2006.01)
  • C12N 15/54 (2006.01)
  • C12N 15/63 (2006.01)
  • C12P 1/04 (2006.01)
  • C12P 7/22 (2006.01)
(72) Inventors :
  • CHATTERJEE, RANJINI (United States of America)
  • ZAHN, KENNETH (United States of America)
  • MITCHELL, KENNETH (United States of America)
  • LIU, GARY Y. (United States of America)
(73) Owners :
  • ALIGNA TECHNOLOGIES, INC.
(71) Applicants :
  • ALIGNA TECHNOLOGIES, INC. (United States of America)
(74) Agent: GOWLING WLG (CANADA) LLP
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date: 2011-08-29
(87) Open to Public Inspection: 2012-03-22
Availability of licence: N/A
Dedicated to the Public: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): Yes
(86) PCT Filing Number: PCT/US2011/049619
(87) International Publication Number: US2011049619
(85) National Entry: 2013-03-14

(30) Application Priority Data:
Application No. Country/Territory Date
61/403,400 (United States of America) 2010-09-15
61/455,709 (United States of America) 2010-10-25

Abstracts

English Abstract

The teachings provided herein are generally directed to a method of converting lignin-derived compounds to valuable aromatic chemicals using an enzymatic, bioconversion process. The teachings provide a selection of (i) host cells that are tolerant to the toxic compounds present in lignin fractions; (ii) polypeptides that can be used as enzymes in the bioconversion of the lignin fractions to the aromatic chemical products; (iii) polynucleotides that can be used to transform the host cells to express the selection of polypeptides as enzymes in the bioconversion of the lignin fractions; and (iv) the transformants that express the enzymes.


French Abstract

L'invention concerne de manière générale un procédé de conversion de composés dérivés de lignine en produits chimiques aromatiques de valeur au moyen d'un procédé de conversion biologique enzymatique. L'invention concerne une sélection (i) de cellules hôtes qui sont tolérantes aux composés toxiques présents dans les fractions de lignine ; (ii) de polypeptides qui peuvent être utilisés comme enzymes dans la conversion biologique des fractions de lignine en produits chimiques aromatiques ; (iii) de polynucléotides qui peuvent être utilisés pour transformer les cellules hôtes afin qu'elles expriment la sélection de polypeptides en tant qu'enzymes dans la conversion biologique des fractions de lignine ; et (iv) de transformants qui expriment les enzymes.

Claims

Note: Claims are shown in the official language in which they were submitted.


WE CLAIM
1. An isolated recombinant polypeptide, comprising:
an amino acid sequence having at least 95% identity to SEQ ID NO:101, the
amino acid
sequence conserving residues 1, 2, 4-8, 10-12, 14, 17, 19-22, 24, 25, 27-37,
39,
41-54, 57, 58, 60, 62-67, 69-73, 75, 77-80, 82-87, 89, 100, 102, 103, 104,
105,
107, 110-114, 117, 212, 122, 124-130, 133, 134, 137-139, 148, 149, 151-156,
159, 160, 166-168, 170, 173, 174, 178-181, 184, 185, 187-189, 198-201, 204,
205, 207, 210-216, 219, 222, 223, 226-232, 235-239, 242-246, 249, 251, 254,
257, 264, 266, 267, 270, 275, and 278 of SEQ ID NO:101;
wherein, an amino acid substitution outside of the conserved residues is a
conservative
substitution; and,
the amino acid sequence functions to cleave a beta-aryl ether.
2. An isolated recombinant polypeptide, comprising:
an amino acid sequence having at least 95% identity to SEQ ID NO:101, the
amino acid
sequence conserving residues 19-22, 24, 25, 27-30, 33-36, 39-45, 47, 48, 50-
54;
100, 101, 104, 111, 112, 115, 116, 166, 107, 184, 187, 188, 191, 192, and 195
of
SEQ ID:101;
wherein an amino acid substitution outside of the conserved residues is a
conservative
substitution.
3. The isolated recombinant polypeptide of claim 2, wherein the amino acid
sequence
functions to cleave a beta-aryl ether.
4. An isolated recombinant polypeptide, comprising:
SEQ ID NO:101; or conservative substitutions thereof outside of conserved
residues 19-
22, 24, 25, 27-30, 33-36, 39-45, 47, 48, 50-54; 100, 101, 104, 111, 112, 115,
116, 166, 107, 184, 187, 188, 191, 192, and 195 of SEQ ID:101.
5. A isolated recombinant glutathione S-transferase enzyme, comprising:
an amino acid sequence having at least 95% identity to SEQ ID NO:101, the
amino acid
sequence conserving residues 19-22, 24, 25, 27-30, 33-36, 39-45, 47, 48, 50-
54;
-135 -

100, 101, 104, 111, 112, 115, 116, 166, 107, 184, 187, 188, 191, 192, and 195
of
SEQ ID:101;
wherein, the amino acid sequence functions to cleave a beta-aryl ether.
6. A isolated recombinant glutathione S-transferase enzyme, comprising:
an amino acid sequence having at least 95% identity to SEQ ID NO:101; wherein,
the
amino acid sequence functions to cleave a beta-aryl ether.
7. An isolated recombinant polypeptide, comprising:
a length ranging from about 279 to about 281 amino acids;
a first amino acid region consisting of residues 19-54 from SEQ ID NO:101, or
conservative substitutions thereof outside of conserved residues 19-22, 24,
25,
27-30, 33-36, 39-45, 47, 48, and 50-54 of SEQ ID NO:101; and,
a second amino acid region consisting of residues 98-221 from SEQ ID NO:101,
or
conservative substitutions thereof outside of conserved residues 100, 101,
104,
111, 112, 115, 116, 166, 107, 184, 187, 188, 191, 192, and 195 of SEQ ID:101.
8. An isolated recombinant glutathione S-transferase enzyme, comprising:
a length ranging from about 279 to about 281 amino acids;
a first amino acid region having at least 95% identity to residues 19-54 from
SEQ ID
NO:101 while conserving residues 19-22, 24, 25, 27-30, 33-36, 39-45, 47, 48,
and 50-54 of SEQ ID NO:101; wherein, the first amino acid region is located in
the recombinant polypeptide from about residue 14 to about residue 59; and,
a second amino acid region having at least 95% identity to residues 98-221
from SEQ ID
NO:101 while conserving residues 100, 101, 104, 111, 112, 115, 116, 166, 107,
184, 187, 188, 191, 192, and 195 of SEQ ID:101; wherein, the second amino
acid region is located in the recombinant polypeptide from about residue 93 to
about residue 226; and,
wherein, the recombinant glutathione S-transferase enzyme functions to cleave
a beta-
aryl ether.
9. The isolated recombinant polypeptide of claim 8, wherein an amino acid
substitution
outside of the conserved residues is a conservative substitution.
-136-

10. A method of cleaving a beta-aryl ether bond, comprising:
contacting a polypeptide comprising an amino acid sequence having at least 95%
identity to SEQ ID NO:101, the amino acid sequence conserving residues 19-22,
24, 25, 27-30, 33-36, 39-45, 47, 48, 50-54; 100, 101, 104, 111, 112, 115, 116,
166, 107, 184, 187, 188, 191, 192, and 195 of SEQ ID:101, with a lignin-
derived
compound having (i) a beta-aryl ether bond and (ii) a molecular weight ranging
from about 180 Daltons to about 3000 Daltons;
wherein, the contacting occurs in a solvent environment in which the lignin-
derived
compound is soluble.
11. The method of claim 10, wherein the lignin-derived compound has a
molecular weight of
about 180 Daltons to about 1000 Daltons.
12. The method of claim 10, wherein an amino acid substitution outside of
the conserved
residues is a conservative substitution.
13. The method of claim 10, wherein the solvent environment comprises
water.
14. The method of claim 10, wherein the solvent environment comprises a
polar organic
solvent.
15. A method of cleaving a beta-aryl ether bond, comprising:
contacting a polypeptide comprising an amino acid sequence having at least 95%
identity to SEQ ID NO:101, the amino acid sequence conserving residues 19-22,
24, 25, 27-30, 33-36, 39-45, 47, 48, 50-54; 100, 101, 104, 111, 112, 115, 116,
166, 107, 184, 187, 188, 191, 192, and 195 of SEQ ID:101, with a lignin-
derived
compound having (i) a beta-aryl ether bond and (ii) a molecular weight ranging
from about 180 Daltons to about 3000 Daltons;
wherein, the contacting occurs in a solvent environment in which the lignin-
derived
compound is soluble.
16. The method of claim 15, wherein the lignin-derived compound has a
molecular weight of
about 180 Daltons to about 1000 Daltons.
17. The method of claim 15, wherein the solvent environment comprises
water.
-137-

18. The method of claim 15, wherein the solvent environment comprises a
polar organic
solvent.
19. A system for bioprocessing lignin-derived compounds, comprising:
a polypeptide comprising an amino acid sequence having at least 95% identity
to SEQ
ID NO:101, the amino acid sequence conserving residues 19-22, 24, 25, 27-30,
33-36, 39-45, 47, 48, 50-54; 100, 101, 104, 111, 112, 115, 116, 166, 107, 184,
187, 188, 191, 192, and 195 of SEQ ID:101;
a lignin-derived compound having a beta-aryl ether bond and a molecular weight
ranging
from about 180 Daltons to about 3000 Daltons; and,
a solvent in which the lignin-derived compound is soluble;
wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide with the lignin-derived compound in the solvent.
20. The system of claim 19, wherein an amino acid substitution outside of
the conserved
residues is a conservative substitution.
21. A recombinant polynucleotide comprising a nucleotide sequence that
encodes a
polypeptide comprising an amino acid sequence having at least 95% identity to
SEQ ID
NO:101, the amino acid sequence conserving residues 19-22, 24, 25, 27-30, 33-
36, 39-45, 47,
48, 50-54; 100, 101, 104, 111, 112, 115, 116, 166, 107, 184, 187, 188, 191,
192, and 195 of
SEQ ID:101.
22. A recombinant polynucleotide comprising a nucleotide sequence that
encodes a
polypeptide comprising SEQ ID NO:101; or conservative substitutions thereof
outside of
conserved residues 19-22, 24, 25, 27-30, 33-36, 39-45, 47, 48, 50-54; 100,
101, 104, 111, 112,
115, 116, 166, 107, 184, 187, 188, 191, 192, and 195 of SEQ ID:101.
23. A vector comprising the polynucleotide of claim 21.
24. A vector comprising the polynucleotide of claim 22.
25. A plasmid comprising the polynucleotide of claim 21.
26. A plasmid comprising the polynucleotide of claim 22.
-138-

27. A host cell transformed by the vector of claim 23.
28. A host cell transformed by the vector of claim 24.
29. A method of cleaving a beta-aryl ether bond, comprising
culturing the host cell of claim 27 under conditions suitable to produce the
polypeptide;
recovering the polypeptide from the host cell culture; and,
contacting the polypeptide with a lignin-derived compound having (i) a beta-
aryl ether
bond and (ii) a molecular weight ranging from about 180 Daltons to about 3000
Daltons;
wherein, the contacting occurs in a solvent environment in which the lignin-
derived
compound is soluble.
30. The method of claim 29, wherein the host cell is E. Coli.
31. The method of claim 29, wherein the host cell is Azotobacter
vinelandii.
32. The method of claim 29, wherein the lignin-derived compound has a
molecular weight of
about 180 Daltons to about 1000 Daltons.
33. The method of claim 29, wherein an amino acid substitution outside of
the conserved
residues is a conservative substitution.
34. The method of claim 29, wherein the solvent environment comprises
water.
35. The method of claim 29, wherein the solvent environment comprises a
polar organic
solvent.
36. A method of cleaving a beta-aryl ether bond, comprising
culturing the host cell of claim 28 under conditions suitable to produce the
polypeptide;
recovering the polypeptide from the host cell culture; and,
contacting the polypeptide with a lignin-derived compound having (i) a beta-
aryl ether
bond and (ii) a molecular weight ranging from about 180 Daltons to about 3000
Daltons;
-139-

wherein, the contacting occurs in a solvent environment in which the lignin-
derived
compound is soluble.
37. The method of claim 36, wherein the host cell is E. Coli.
38. The method of claim 36, wherein the host cell is Azotobacter
vinelandii.
39. The method of claim 36, wherein the lignin-derived compound has a
molecular weight of
about 180 Daltons to about 1000 Daltons.
40. The method of claim 36, wherein the solvent environment comprises
water.
41. The method of claim 36, wherein the solvent environment comprises a
polar organic
solvent.
42. A system for bioprocessing lignin-derived compounds, comprising:
the transformed host cell of claim 27;
a lignin-derived compound having a beta-aryl ether bond and a molecular weight
ranging
from about 180 Daltons to about 3000 Daltons; and,
a solvent in which the lignin-derived compound is soluble;
wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide with the lignin-derived compound in the solvent.
43. The system of claim 42, wherein the transformed host cell comprises
Azotobacter
vinelandii.
44. The system of claim 42, wherein the transformed host cell expresses the
polypeptide in
the solvent in which the lignin-derived compound is soluble.
45. A system for bioprocessing lignin-derived compounds, comprising:
a transformant including a host cell transformed with the vector of claim 23,
the
transformant expressing the polypeptide;
a lignin-derived compound having a beta-aryl ether bond and a molecular weight
ranging
from about 180 Daltons to about 3000 Daltons; and,
a solvent in which the lignin-derived compound is soluble;
-140-

wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide with the lignin-derived compound in the solvent.
46. The system of claim 45, wherein the transformant comprises E. Coli.
47. The system of claim 45, wherein the transformant comprises Azotobacter
vinelandii.
48. The system of claim 45, wherein an amino acid substitution outside of
the conserved
residues is a conservative substitution.
49. The system of claim 45, wherein the transformed host cell expresses the
polypeptide in
the solvent in which the lignin-derived compound is soluble.
50. The system of claim 45, wherein the lignin-derived compound has a
molecular weight
ranging from about 180 Daltons to about 1000 Daltons.
51. The system of claim 45, wherein the solvent environment comprises
water.
52. The system of claim 45, wherein the solvent environment comprises a
polar organic
solvent.
53. A system for bioprocessing lignin-derived compounds, comprising:
a transformant including a host cell transformed with the vector of claim 24,
the
transformant expressing the polypeptide;
a lignin-derived compound having a beta-aryl ether bond and a molecular weight
ranging
from about 180 Daltons to about 3000 Daltons; and,
a solvent in which the lignin-derived compound is soluble;
wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide with the lignin-derived compound in the solvent.
54. The system of claim 53, wherein the transformant comprises E. Coli.
55. The system of claim 53, wherein the transformant comprises Azotobacter
vinelandii.
56. The system of claim 53, wherein the transformed host cell expresses the
polypeptide in
the solvent in which the lignin-derived compound is soluble.
-141-

57. The system of claim 53, wherein the lignin-derived compound has a
molecular weight
ranging from about 180 Daltons to about 1000 Daltons.
58. The system of claim 53, wherein the solvent environment comprises
water.
59. The system of claim 53, wherein the solvent environment comprises a
polar organic
solvent.
60. A system for bioprocessing lignin-derived compounds, comprising:
a transformant including an Azotobacter vinelandii host cell transformed with
the vector
of claim 23, the transformant expressing the polypeptide;
a lignin-derived compound having a beta-aryl ether bond and a molecular weight
ranging
from about 180 Daltons to about 3000 Daltons; and,
a solvent in which the lignin-derived compound is soluble;
wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide with the lignin-derived compound in the solvent.
61. The system of claim 60, wherein the transformed host cell expresses the
polypeptide in
the solvent in which the lignin-derived compound is soluble.
62. The system of claim 60, wherein the lignin-derived compound has a
molecular weight
ranging from about 180 Daltons to about 1000 Daltons.
63. The system of claim 60, wherein the solvent environment comprises
water.
64. The system of claim 60, wherein the solvent environment comprises a
polar organic
solvent.
65. An isolated recombinant polypeptide, comprising:
an amino acid sequence having at least 95% identity to SEQ ID NO:541, the
amino acid
sequence conserving residues 47-57, 63-76, 100, 101, 104, 107, 111, 112, 115,
116, 176, 194, 197, 198, 201, 202, and 206.
66. The isolated recombinant polypeptide of claim 65, wherein an amino acid
substitution
outside of the conserved residues is a conservative substitution.
-142-

67. The isolated recombinant polypeptide of claim 65, wherein, the amino
acid sequence
functions to cleave a beta-aryl ether.
68. A isolated recombinant polypeptide, comprising:
SEQ ID NO:541; or conservative substitutions thereof outside of conserved
residues 47-
57, 63-76, 100, 101, 104, 107, 111, 112, 115, 116, 176, 194, 197, 198, 201,
202,
and 206.
69. A isolated recombinant glutathione S-transferase enzyme, comprising:
an amino acid sequence having at least 95% identity to SEQ ID NO:541, the
amino acid
sequence conserving residues conserved residues 47-57, 63-76, 100, 101, 104,
107, 111, 112, 115, 116, 176, 194, 197, 198, 201, 202, and 206;
wherein, the amino acid sequence functions to cleave a beta-aryl ether.
70. A isolated recombinant glutathione S-transferase enzyme, comprising:
an amino acid sequence having at least 95% identity to SEQ ID NO:541; wherein,
the
amino acid sequence functions to cleave a beta-aryl ether.
71. An isolated recombinant polypeptide, comprising:
a length ranging from about 256 to about 260 amino acids;
a first amino acid region consisting of residues 47-57 from SEQ ID NO:541, or
conservative substitutions thereof outside of conserved residues 47, 48, 49,
50,
52, 54, 55, 56, 57;
a second amino acid region consisting of 63-76 from SEQ ID NO:541; and,
a third amino acid region consisting of residues 99-230 from SEQ ID NO:541, or
conservative substitutions thereof outside of conserved residues 100, 101,
104,
107, 111, 112, 115, 116, 176, 194, 197, 198, 201, 202, and 206.
72. An isolated recombinant glutathione S-transferase enzyme, comprising:
a length ranging from about 279 to about 281 amino acids;
a first amino acid region having at least 95% identity to 47-57 from SEQ ID
NO:541, or
conservative substitutions thereof outside of conserved residues 47, 48, 49,
50,
52, 54, 55, 56, 57;
-143-

a second amino acid region consisting of 63-76 from SEQ ID NO:541; and,
a third amino acid region having at least 95% identity to residues 99-230 from
SEQ ID
NO:541, or conservative substitutions thereof outside of conserved residues
100,
101, 104, 107, 111, 112, 115, 116, 176, 194, 197, 198, 201, 202, and 206;
wherein, the recombinant glutathione S-transferase enzyme functions to cleave
a beta-
aryl ether.
73. The isolated recombinant polypeptide of claim 72, wherein an amino acid
substitution
outside of the conserved residues is a conservative substitution.
74. A method of cleaving a beta-aryl ether bond, comprising:
contacting an amino acid sequence having at least 95% identity to SEQ ID
NO:541, the
amino acid sequence conserving residues 47-57, 63-76, 100, 101, 104, 107, 111,
112, 115, 116, 176, 194, 197, 198, 201, 202, and 206 with a lignin-derived
compound having (i) a beta-aryl ether bond and (ii) a molecular weight ranging
from about 180 Daltons to about 3000 Daltons;
wherein, the contacting occurs in a solvent environment in which the lignin-
derived
compound is soluble.
75. The method of claim 74, wherein the lignin-derived compound has a
molecular weight of
about 180 Daltons to about 1000 Daltons.
76. The method of claim 74, wherein an amino acid substitution outside of
the conserved
residues is a conservative substitution.
77. The method of claim 74, wherein the solvent environment comprises
water.
78. The method of claim 74, wherein the solvent environment comprises a
polar organic
solvent.
79. A method of cleaving a beta-aryl ether bond, comprising:
contacting a polypeptide comprising SEQ ID NO:541; or conservative
substitutions
thereof outside of conserved residues 47-57, 63-76, 100, 101, 104, 107, 111,
112, 115, 116, 176, 194, 197, 198, 201, 202, and 206 with a lignin-derived
-144-

compound having (i) a beta-aryl ether bond and (ii) a molecular weight ranging
from about 180 Daltons to about 3000 Daltons;
wherein, the contacting occurs in a solvent environment in which the lignin-
derived
compound is soluble.
80. The method of claim 79, wherein the lignin-derived compound has a
molecular weight of
about 180 Daltons to about 1000 Daltons.
81. The method of claim 79, wherein the solvent environment comprises
water.
82. The method of claim 79, wherein the solvent environment comprises a
polar organic
solvent.
83. A system for bioprocessing lignin-derived compounds, comprising:
a polypeptide having at least 95% identity to SEQ ID NO:541, the amino acid
sequence
conserving residues 47-57, 63-76, 100, 101, 104, 107, 111, 112, 115, 116, 176,
194, 197, 198, 201, 202, and 206;
a lignin-derived compound having a beta-aryl ether bond and a molecular weight
ranging
from about 180 Daltons to about 3000 Daltons; and,
a solvent in which the lignin-derived compound is soluble;
wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide with the lignin-derived compound in the solvent.
84. The system of claim 83, wherein an amino acid substitution outside of
the conserved
residues is a conservative substitution.
85. A recombinant polynucleotide comprising a nucleotide sequence that
encodes the
polypeptide of claim 65.
86. A recombinant polynucleotide comprising a nucleotide sequence that
encodes the
polypeptide of claim 68.
87. A vector comprising the polynucleotide of claim 65.
88. A vector comprising the polynucleotide of claim 68.
-145-

89. A plasmid comprising the polynucleotide of claim 65.
90. A plasmid comprising the polynucleotide of claim 68.
91. A host cell transformed by the vector of claim 87.
92. A host cell transformed by the vector of claim 88.
93. A method of cleaving a beta-aryl ether bond, comprising
culturing the host cell of claim 91 under conditions suitable to produce the
polypeptide;
recovering the polypeptide from the host cell culture; and,
contacting the polypeptide with a lignin-derived compound having (i) a beta-
aryl ether
bond and (ii) a molecular weight ranging from about 180 Daltons to about 3000
Daltons;
wherein, the contacting occurs in a solvent environment in which the lignin-
derived
compound is soluble.
94. The method of claim 93, wherein the host cell is E. Coli.
95. The method of claim 93, wherein the host cell is Azotobacter
vinelandii.
96. The method of claim 93, wherein the lignin-derived compound has a
molecular weight of
about 180 Daltons to about 1000 Daltons.
97. The method of claim 93, wherein an amino acid substitution outside of
the conserved
residues is a conservative substitution.
98. The method of claim 93, wherein the solvent environment comprises
water.
99. The method of claim 93, wherein the solvent environment comprises a
polar organic
solvent.
100. A method of cleaving a beta-aryl ether bond, comprising
culturing the host cell of claim 92 under conditions suitable to produce the
polypeptide;
recovering the polypeptide from the host cell culture; and,
-146-

contacting the polypeptide with a lignin-derived compound having (i) a beta-
aryl ether
bond and (ii) a molecular weight ranging from about 180 Daltons to about 3000
Daltons;
wherein, the contacting occurs in a solvent environment in which the lignin-
derived
compound is soluble.
101. The method of claim 100, wherein the host cell is E. Coli.
102. The method of claim 100, wherein the host cell is Azotobacter vinelandii.
103. The method of claim 100, wherein the lignin-derived compound has a
molecular weight
of about 180 Daltons to about 1000 Daltons.
104. The method of claim 100, wherein the solvent environment comprises water.
105. The method of claim 100, wherein the solvent environment comprises a
polar organic
solvent.
106. A system for bioprocessing lignin-derived compounds, comprising:
the transformed host cell of claim 91;
a lignin-derived compound having a beta-aryl ether bond and a molecular weight
ranging
from about 180 Daltons to about 3000 Daltons; and,
a solvent in which the lignin-derived compound is soluble;
wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide with the lignin-derived compound in the solvent.
107. The system of claim 106, wherein the transformed host cell comprises
Azotobacter
vinelandii.
108. The system of claim 106, wherein the transformed host cell expresses the
polypeptide of
claim 65 in the solvent in which the lignin-derived compound is soluble.
109. A system for bioprocessing lignin-derived compounds, comprising:
a transformant including a host cell transformed with the vector of claim 87,
the
transformant expressing the polypeptide;
-147-

a lignin-derived compound having a beta-aryl ether bond and a molecular weight
ranging
from about 180 Daltons to about 3000 Daltons; and,
a solvent in which the lignin-derived compound is soluble;
wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide of with the lignin-derived compound in the solvent.
110. The system of claim 109, wherein the transformant comprises E. Coli.
111. The system of claim 109, wherein the transformant comprises Azotobacter
vinelandii.
112. The system of claim 109, wherein an amino acid substitution outside of
the conserved
residues is a conservative substitution.
113. The system of claim 109, wherein the transformed host cell expresses the
polypeptide in
the solvent in which the lignin-derived compound is soluble.
114. The system of claim 109, wherein the lignin-derived compound has a
molecular weight
ranging from about 180 Daltons to about 1000 Daltons.
115. The system of claim 109, wherein the solvent environment comprises water.
116. The system of claim 109, wherein the solvent environment comprises a
polar organic
solvent.
117. A system for bioprocessing lignin-derived compounds, comprising:
a transformant including a host cell transformed with the vector of claim 88,
the
transformant expressing the polypeptide;
a lignin-derived compound having a beta-aryl ether bond and a molecular weight
ranging
from about 180 Daltons to about 3000 Daltons; and,
a solvent in which the lignin-derived compound is soluble;
wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide with the lignin-derived compound in the solvent.
118. The system of claim 117, wherein the transformant comprises E. Coli.
119. The system of claim 117, wherein the transformant comprises Azotobacter
vinelandii.
-148-

120. The system of claim 117, wherein the transformed host cell expresses the
polypeptide in
the solvent in which the lignin-derived compound is soluble.
121. The system of claim 117, wherein the lignin-derived compound has a
molecular weight
ranging from about 180 Daltons to about 1000 Daltons.
122. The system of claim 117, wherein the solvent environment comprises water.
123. The system of claim 117, wherein the solvent environment comprises a
polar organic
solvent.
124. A system for bioprocessing lignin-derived compounds, comprising:
a transformant including an Azotobacter vinelandii host cell transformed with
the vector
of claim 87, the transformant expressing the polypeptide;
a lignin-derived compound having a beta-aryl ether bond and a molecular weight
ranging
from about 180 Daltons to about 3000 Daltons; and,
a solvent in which the lignin-derived compound is soluble;
wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide with the lignin-derived compound in the solvent.
125. The system of claim 124, wherein the transformed host cell expresses the
polypeptide of
claim 65 in the solvent in which the lignin-derived compound is soluble.
126. The system of claim 124, wherein the lignin-derived compound has a
molecular weight
ranging from about 180 Daltons to about 1000 Daltons.
127. The system of claim 124, wherein the solvent environment comprises water.
128. The system of claim 124, wherein the solvent environment comprises a
polar organic
solvent.
-149-

Description

Note: Descriptions are shown in the official language in which they were submitted.


CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
BIOPRODUCTION OF AROMATIC CHEMICALS FROM LIGNIN-DERIVED COMPOUNDS
RANJINI CHATTERJEE
KENNETH ZAHN
KENNETH MITCHELL
GARY LIU
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application
Nos. 61/403,440, filed
9/15/2010; and 61/455,709, filed 10/25/2010; each application of which is
hereby
incorporated herein by reference in it's entirety,
SEQUENCE LISTING
[0002] The instant application is filed with an ASCII compliant text file of a
Sequence Listing.
The name of the attached file is ALIGPOO4US01 SEQLIST AS-FILED.txt, and the
file was
created August 29, 2011, is 813 KB in size, and is hereby incorporated herein
by reference
in its entirety. Because the ASCII compliant text file serves as both the
paper copy required
by 1.821(c) and the CRF required by 1.821(e), the statement indicating
that the paper
copy and CRF copy of the sequence listing are identical is no longer necessary
under 37
C.F.R. 1.821(f), as per Federal Register /Vol. 74, No. 206 /Tuesday, October
27, 2009,
Section I.
BACKGROUND OF THE INVENTION
Field of the Invention
[0003] The teachings provided herein are generally directed to a method of
converting lignin-
derived compounds to valuable aromatic chemicals using an enzymatic,
bioconversion
process.
-1 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
Description of the Related Art
[0004] Currently, there is a worldwide, global dependence on petroleum as a
deplete-able
feedstock for the manufacture of fuels and chemicals. The problems of using
petroleum are
so well-known and documented that they've become nearly a cliché to the world
population.
In short, petroleum-based processes are dirty and hazardous. Environmental
effects
associated with the use of petroleum are known to include, for example, air
pollution, global
warming, damage from extraction, oil spills, tarballs, and health hazards to
humans,
domestic animals, and wildlife.
[0005] Oil refineries, for example, are petroleum-based processes that
primarily produce
gasoline. However, they are also used extensively to produce valuable and less
well-known
chemical products used in the manufacture of pharmaceuticals, agrochemicals,
food
ingredients, and plastics. A clean, green alternative to this market area
would be
appreciated worldwide.
[0006] Bioprocesses can present a clean, green alternative to the petroleum-
based processes,
a bioprocess being one that uses organisms, cells, organelles, or enzymes to
carry out a
commercial process. Biorefineries, for example, can produce, for example,
chemicals, heat
and power, as well as food, feed, fuel and industrial chemical products.
Examples of
biorefineries can include wet and dry corn mills, pulp and paper mills, and
the biofuels
industry. In leather tanning, hides are softened and hair is removed using
proteases. In
brewing, amylases are used in germinating barley. In cheese-making, rennin is
used to
coagulated the proteins in mil. The biofuels industry, for example, has been a
point of focus
recently, naturally focusing on fuel products to replace petroleum-based fuels
and, as a
result, has not developed other valuable chemical products that also rely on
petroleum-
based processes.
[0007] As such, biorefineries use enzymes to convert natural products to
useful chemicals. A
natural product, such as the wood that is used in a pulp and paper mill,
contains cellulose,
hemicelluloses, and lignin. A typical range of compositions for a hardwood may
be about
40-44% cellulose, about 15-35% hemicelluloses, and about 18-25% lignin.
Likewise, a
typical range of compositions for a softwood may be about 40-44% cellulose,
about 20-32%
hemicelluloses, and about 25-35% lignin. Since all biofuels come from
cellulosic
biorefineries, where the key raw material is glucose, derived from cellulose,
lignin remains
underutilized. Lignin is the single most abundant source of aromatic compounds
in nature,
and the use of lignin is currently limited to low value applications, such as
combustion to
- 2 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
generate process heat and energy for the biorefinery facilities. In the
alternative, lignin is
sold as a natural component of animal feeds or fertilizers. Interestingly,
however, lignin is
the only plant biomass component based on aromatic core structures, and such
core
structures are valuable in the production of industrial chemicals. One of
skill will appreciate
that, unfortunately, a major problem to such a use of lignin remains: the
aromatic
compounds present in the lignin fraction of a biorefinery include toxic
compounds that inhibit
the growth and survival of industrial microbes. For at least these reasons,
processes for
converting lignin fractions to industrial products using industrial microbes
have not been
successful.
[0008] In view of the above, one of skill will appreciate (i) a clean, green
replacement for
petroleum-based processes in the production of valuable chemical products that
include
major markets such as, for example, pharmaceuticals, agrochemicals, food
ingredients, and
plastics; (ii) a profitable use of the abundant and renewable natural resource
available in
lignin, which is currently an industrial waste stream that is underutilized as
an industrial
feedstock; (iii) a selection of host cells that are tolerant to the toxic
compounds present in
lignin fractions in the feedstock; (iv) a selection of polypeptides that can
be used as
enzymes in the bioconversion of the lignin fractions to the valuable chemical
products; (v) a
selection of polynucleotides that can be used to transform host cells to
express the selection
of polypeptides in the bioconversion of the lignin fractions to the valuable
chemical products;
(vi) systems that include transformants that express the enzymes, where the
transformants
can be used to (a) express the enzymes while in direct contact with the lignin
fractions or (b)
express the enzymes for extraction from the cells, after which the extracted
enzymes are
used directly in contact with the lignin fractions; and (vii) a clean-and-
green method of
producing valuable chemical products at higher profits than petroleum-based
processes.
SUMMARY
[0009] This invention is generally directed to a recombinant method of
producing enzymes for
use in the bioconversion of lignin-derived compounds to valuable aromatic
chemicals. In
some embodiments, the teachings are directed to an isolated recombinant
polypeptide,
comprising an amino acid sequence having at least 95% identity to SEQ ID
NO:101. The
sequence can conserve residues T19,120, S21, P22, V24, W25, T27, K28, Y29,
A30, H33,
K34, G35, F36, D39,140, V41, P42, G43, G44, F45, G47,148, E50, R51, T52, G53,
G54,
- 3 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
K100, A101, N104, V111, G112, M115, F116, P166, W107, Y184, Y187, R188, G191,
G192, and F195.
[0010] In some embodiments, the teachings are directed to an isolated
recombinant
polypeptide, comprising SEQ ID NO:101; or conservative substitutions thereof
outside of the
conserved residues. The conserved residues can include T19, 120, S21, P22,
V24, W25,
T27, K28, Y29, A30, H33, K34, G35, F36, D39,140, V41, P42, G43, G44, F45,
G47,148,
E50, R51, T52, G53, G54; K100, A101, N104, V111, G112, M115, F116, P166, W107,
Y184, Y187, R188, G191, G192, and F195.
[0011] In some embodiments, the teachings are directed to an isolated
recombinant glutathione
S-transf erase enzyme, comprising an amino acid sequence having at least 95%
identity to
SEQ ID NO:101. The amino acid sequence can conserve residues T19, 120, S21,
P22,
V24, W25, T27, K28, Y29, A30, H33, K34, G35, F36, D39,140, V41, P42, G43, G44,
F45,
G47,148, E50, R51, T52, G53, G54; K100, A101, N104, V111, G112, M115, F116,
P166,
W107, Y184, Y187, R188, G191, G192, and F195; wherein, the amino acid sequence
functions to cleave a beta-aryl ether.
[0012] In some embodiments, the teachings are directed to an isolated
recombinant glutathione
S-transf erase enzyme, comprising an amino acid sequence having at least 95%
identity to
SEQ ID NO:101; wherein, the amino acid sequence functions to cleave a beta-
aryl ether.
[0013] In some embodiments, the teachings are directed to an isolated
recombinant
polypeptide, comprising (i) a length ranging from about 279 to about 281 amino
acids; (ii) a
first amino acid region consisting of residues 19-54 from SEQ ID NO:101, or
conservative
substitutions thereof outside of conserved residues T19,120, S21, P22, V24,
W25, T27,
K28, Y29, A30, H33, K34, G35, F36, D39, 140, V41, P42, G43, G44, F45, G47,
148, E50,
R51, T52, G53, and G54; wherein, the first amino acid region can be located in
the
recombinant polypeptide from about residue 14 to about residue 59; and, (iii)
a second
amino acid region consisting of residues 98-221 from SEQ ID NO:101, or
conservative
substitutions thereof outside of conserved residues K100, A101, N104, V111,
G112, M115,
F116, P166, W107, Y184, Y187, R188, G191, G192, and F195; wherein, the second
amino
acid region is located in the recombinant polypeptide from about residue 93 to
about residue
226.
[0014] In some embodiments, the teachings are directed to an isolated
recombinant glutathione
S-transf erase enzyme, comprising (i) a length ranging from about 279 to about
281 amino
- 4 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
acids; (ii) a first amino acid region having at least 95% identity to residues
19-54 from SEQ
ID NO:101 while conserving residues T19,120, S21, P22, V24, W25, T27, K28,
Y29, A30,
H33, K34, G35, F36, D39,140, V41, P42, G43, G44, F45, G47, 148, E50, R51, T52,
G53,
and G54; wherein, the first amino acid region is located in the recombinant
polypeptide from
about residue 14 to about residue 59; and, (iii) a second amino acid region
having at least
95% identity to residues 98-221 from SEQ ID NO:101 while conserving residues
K100,
A101, N104, V111, G112, M115, F116, P166, W107, Y184, Y187, R188, G191, G192,
and
F195; wherein, the second amino acid region can be located in the recombinant
polypeptide
from about residue 93 to about residue 226; and, the recombinant glutathione S-
transferase
enzyme can function to cleave a beta-aryl ether.
[0015] In some embodiments, the teachings are directed to an isolated
recombinant glutathione
S-transf erase enzyme, comprising an amino acid sequence having at least 95%
identity to
SEQ ID NO:541; wherein, the amino acid sequence functions to cleave a beta-
aryl ether.
[0016] In some embodiments, the teachings are directed to an isolated
recombinant
polypeptide, comprising (i) a length ranging from about 256 to about 260 amino
acids; (ii) a
first amino acid region consisting of residues 47-57 from SEQ ID NO:541, or
conservative
substitutions thereof outside of conserved residues A47, 148, N49, P50, G52,
V54, P55,
V56, L57; wherein, the first amino acid region is located in the recombinant
polypeptide from
about residue 45 to about residue 57; (iii) a second amino acid region
consisting of 63-76
from SEQ ID NO:541; and, (iv) a third amino acid region consisting of residues
99-230 from
SEQ ID NO:541, or conservative substitutions thereof outside of conserved
residues R100,
Y101, K104, D107, M111, N112, S115, M116, K176, L194,1197, N198, S201, H202,
and
M206; wherein, the second amino acid region is located in the recombinant
polypeptide from
about residue 94 to about residue 235.
[0017] In some embodiments, the teachings are directed to an isolated
recombinant glutathione
S-transf erase enzyme, comprising (i) a length ranging from about 279 to about
281 amino
acids; (ii) a first amino acid region having at least 95% identity to 47-57
from SEQ ID
NO:541, or conservative substitutions thereof outside of conserved residues
A47, 148, N49,
P50, G52, V54, P55, V56, L57; wherein, the first amino acid region can be
located in the
recombinant polypeptide from about residue 45 to about residue 57; (iii) a
second amino
acid region consisting of 63-76 from SEQ ID NO:541; and, (iv) a third amino
acid region
having at least 95% identity to residues 99-230 from SEQ ID NO:541, or
conservative
substitutions thereof outside of conserved residues R100, Y101, K104, D107,
M111, N112,
- 5 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
S115, M116, K176, L194,1197, N198, S201, H202, and M206; wherein, the second
amino
acid region can be located in the recombinant polypeptide from about residue
94 to about
residue 235; wherein, the recombinant glutathione S-transferase enzyme
functions to cleave
a beta-aryl ether.
[0018] In some embodiments, an amino acid substitution outside of the
conserved residues can
be a conservative substitution. And, in many embodiments, the amino acid
sequence can
function to cleave a beta-aryl ether.
[0019] The teachings are also directed to a method of cleaving a beta-aryl
ether bond, the
comprising contacting a polypeptide taught herein with a lignin-derived
compound having (i)
a beta-aryl ether bond and (ii) a molecular weight ranging from about 180
Da!tons to about
3000 Da!tons; wherein, the contacting occurs in a solvent environment in which
the lignin-
derived compound is soluble.
[0020] In some embodiments, the lignin-derived compound has a molecular weight
of about
180 Da!tons to about 1000 Da!tons. In some embodiments, the solvent
environment
comprises water. And, in some embodiments, the solvent environment comprises a
polar
organic solvent.
[0021] The teachings are also directed to a system for bioprocessing lignin-
derived compounds,
the system comprising a polypeptide taught herein, a lignin-derived compound
having a
beta-aryl ether bond and a molecular weight ranging from about 180 Da!tons to
about 3000
Da!tons; and, a solvent in which the lignin-derived compound is soluble;
wherein, the system
functions to cleave the beta-aryl ether bond by contacting the polypeptide
with the lignin-
derived compound in the solvent.
[0022] The teachings are also directed to a recombinant polynucleotide
comprising a nucleotide
sequence that encodes a polypeptide taught herein. Likewise, the teachings are
also
directed to a vector or plasmid comprising the polynucleotide, as well as a
host cell
transformed by the vector or plasmid to express the polypeptide.
[0023] The teachings are also directed to a method of cleaving a beta-aryl
ether bond, the
method comprising (i) culturing a host cell taught herein under conditions
suitable to
produce a polypeptide taught herein; (ii) recovering the polypeptide from the
host cell
culture; and, (iii) contacting the polypeptide of claim 1 with a lignin-
derived compound having
a beta-aryl ether bond and a molecular weight ranging from about 180 Da!tons
to about
- 6 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
3000 Da!tons; wherein, the contacting occurs in a solvent environment in which
the lignin-
derived compound is soluble.
[0024] In some embodiments, the host cell can be E. Coli or an Azotobacter
strain, such as
Azotobacter vinelandii. And, in some embodiments, the lignin-derived compound
can have
a molecular weight of about 180 Da!tons to about 1000 Da!tons.
[0025] The teachings are also directed to a system for bioprocessing lignin-
derived compounds,
the system comprising (i) a transformed host cell taught herein; (ii) a lignin-
derived
compound having a beta-aryl ether bond and a molecular weight ranging from
about 180
Da!tons to about 3000 Da!tons; and, (iii) a solvent in which the lignin-
derived compound is
soluble; wherein, the system functions to cleave the beta-aryl ether bond by
contacting a
polypeptide taught herein with the lignin-derived compound in the solvent.
BRIEF DESCRIPTION OF THE DRAWINGS
[0026] FIGs. lA and 1B illustrate general concepts of the biorefinery and
discovery processes
discussed herein, according to some embodiments.
[0027] FIG. 2 illustrates the structures of some building block chemicals that
can be produced
using bioconversions, according to some embodiments.
[0028] FIG. 3 is an example of a beta-etherase catalyzed hydrolysis of a model
lignin dimer, a-
0-(B-methylumbelliferyl) acetovanillone (MUAV), according to some embodiments.
[0029] FIG. 4 illustrates unexpected results from biochemical activity assays
for beta-etherase
function for the S. paucimobilis positive control polypeptides, and the N.
aromaticivorans
putative beta-etherase polypeptide, according to some embodiments.
[0030] FIG. 5 illustrates beta-aryl-ether compounds to be tested as substrates
representing
native lignin structures, according to some embodiments.
[0031] FIG. 6 illustrates pathways of guaiacylglycerol-8-guaiacyl ether (GGE)
metabolism by S.
paucimobilis, according to some embodiments.
[0032] FIG. 7 illustrates an example of a biochemical process for the
production of catechol
from lignin oligomers, according to some embodiments.
[0033] FIG. 8 illustrates an example of a biochemical process for the
production of vanillin from
lignin oligomers, according to some embodiments.
- 7 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[0034] FIG. 9 illustrates an example of a biochemical process for the
production of 2,4-
diaminotoluene from lignin oligomers, according to some embodiments.
[0035] FIG. 10 illustrates process schemes for additional product targets that
include ortho-
cresol, salicylic acid, and aminosalicylic acid, for the production of
valuable chemicals from
lignin oligomers, according to some embodiments.
DETAILED DESCRIPTION OF THE INVENTION
[0036] This invention is generally directed to a recombinant method of
producing enzymes for
use in the bioconversion of lignin-derived compounds to valuable aromatic
chemicals.
Currently, the art is limited in it's ability to control the degradation of
lignin to produce useful
products, as it's limited in it's knowledge of enzymes that are capable of
selectively
converting lignin into desired aromatic compounds. Generally, the art knows
two basic
things: (1) lignin is complex; and (2) bacterial lignin degradation systems
are therefore at
least as complex as lignin itself. Accordingly, and for at least these
reasons, the teachings
provided herein offer a valuable, unexpected, and surprising set of systems,
methods, and
compositions of matter that will be useful in the production of industrially
useful aromatic
chemicals.
[0037] FIGs. lA and 1B illustrate general concepts of the biorefinery and
discovery processes
discussed herein, according to some embodiments. FIG. lA shows a generalized
example
of a use of recombinant microbial strains in biotransformations for the
production of aromatic
chemicals from lignin-derived compounds. Biorefinery process 100 converts a
soluble
biorefinery lignin 105 through a series of biotransformations using a
transformed host cell.
The biorefinery lignin 105 is a feedstock comprising a lignin-derived compound
which can
be, for example, a combination of lignin-derived monomers and oligomers.
"Biotransformation 1" 107 can be used to selectively cleave a bond on or
between
monomers to create additional lignin monomers 110. "Biotransformation 2" 112
can be used
to selectively cleave an additional bond on or between monomers to create mono-
aromatic
commercial products 115. FIG. 1B shows a discovery process 120, which includes
selecting
a host cell strain that is tolerant to toxic lignin-derived compounds. The
strain acquisition
125 includes growth of the strain, sample preparation, and storage. A set of
bacterial strains
are obtained for testing strain tolerance to soluble biorefinery lignin
samples.
- 8 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[0038] In some embodiments, the strains can be selected for (i) having well-
characterized
aromatic and xenobiotic metabolisms; (ii) annotated genome sequences; and
(iii) prior use
in fermentation processes at pilot or larger scales. Examples of strains can
include, but are
not limited to, Azotobacter vinelandii (ATCC BAA-1303 DJ), Azotobacter
chroococcum
(ATCC 4412 (EB Fred) X-50), Pseudomonas putida (ATCC BAA-477 Pf-5),
Pseudomonas
fluorescens (ATCC 29837 NCTC 1100). Stains can be streaked on relevant rich
media
plates as described by the accompanying ATCC literature for revival.
Individual colonies (5
each) can be picked and cultured on relevant liquid media to saturation.
Culture samples
prepared in a final glycerol concentration of 12.5% can be flash-frozen and
stored at -80 C.
[0039] The model substrate synthesis 150 for use in the biochemical screening
for selective
activity can be outsourced through a contract research organization (CRO). The
enzyme
discovery effort can initially be focused on identifying potential beta-
etherase candidate
genes identified through bioinformatic methods. The identification of
candidates having
beta-etherase activity is the 1st step towards generating lignin monomers from
lignin
oligomers present in soluble lignin streams. The fluorescent substrate a-0-(13-
methylumbelliferyl) acetovanillone (MUAV), for example, can be used in in
vitro assays to
identify beta-etherase function (Acme Biosciences, Mt. View, CA). The
formation of 4
methylumbelliferone (4MU) upon hydrolysis of the aryl ether bond can be
monitored by
fluorescence, for example, at Aex=365nm and Aem=450nm (or 460nm).
[0040] The gene synthesis, cloning, and transformation step 145 can include
combining
bioinformatic methods with known information about enzymes showing a desired,
selective
enzyme activity. For example, bioinformatics can produce a putative beta-
etherase
sequence that shares a significant homology to the S. paucimobilis ligE and
ligF beta-
etherase sequences. See Masai, E., et al. Journal of Bacteriology (3):1768-
1775(2003)("Masai"), which is hereby incorporated herein in it's entirety by
reference. The
S. paucimobilis sequences can be used as positive controls for biochemical
assays to show
relative activities in an enzyme discovery strategy.
[0041] The gene synthesis, cloning, and transformation step 145 can be
performed using any
method known to one of skill. For example, all genes can be synthesized
directly as open
reading frames (ORFs) from oligonucleotides by using standard PCR-based
assembly
methods, and using the E. coli codon bias. The end sequences can contain
adaptors
(BamHI and HindIII) for restriction digestion and cloning into the E. coli
expression vector
pET24a (Novagen). Internal BamHI and Hind!!! sites can be excluded from the
ORF
- 9 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
sequences during design of the oligonucleotides. Assembled genes can be cloned
into the
proprietary cloning vector (pG0V4), transformed into E. coli CH3 chemically
competent
cells, and DNA sequences determined (Tocore Inc.) from purified plasmid DNA.
After
sequence verification, restriction digestion can be used to excise each ORF
fragment from
the cloning vector, and the sequence can be sub-cloned into pET24a. The entire
set of lig E
and ligF bearing plasmids can then be transformed into E. coli BL21 (DE3)
which can serve
as the host strain for beta-etherase expression and biochemical testing.
[0042] The enzyme screening 155 is done to identify novel etherases 160. The
fluorescent
substrate MUAV can be used to screen for and identify beta-etherase activity
from the
recombinant E. coli clones. Expression of the beta-etherase genes can be done
in 5m1 or
25m1 samples of the recombinant E. coli strains in LB medium using induction
with IPTG.
Following induction, and cell harvest, cell pellets can be be lysed using the
BPER
(Invitrogen) cell lysis system. Cell extracts can be tested in the in vitro
biochemical assay
for beta-etherase activity on the fluorescent substrate MUAV. The formation of
4
methylumbelliferone (4MU) upon hydrolysis of the aryl ether bond in MUAV can
be
monitored by fluorescence at Aex=365nm and Aem=460nm, and can provide
quantitative
measurement of beta-etherase function. Cell extracts of E. coli transformed
with the S.
paucimobilis lig E and lig F genes can be the assay positive controls. Test or
unknown
samples can include, for example, E. coli strains expressing putative beta-
etherase genes
from N. aromaticovorans.
[0043] The lignin stream acquisition 130 includes a waste lignin stream from a
biorefinery for
testing. A preliminary characterization of one source of such lignin has shown
an aromatic
monomer concentration of less than 1g/L and an oligomer concentration of -
10g/L.
Oligomers appear to be associated with carbohydrates in 10:1 ratio for
sugar:phenolics.
Some information exists on compounds in the liquid stream, including benzoic
acid, vanillin,
syringic acid and ferulics, which are routinely quantified in soluble samples.
An average
molecular weight of -280 has been established for the monomers; and the
oligomeric
components remain to be characterized.
[0044] The strain tolerance testing 135 Strain tolerance will be determined by
cell growth upon
exposure to biorefinery lignin. Tolerance to the phenolic compounds in
biorefinery lignin
waste stream will be critically important to the bioprocess efficiency and
high level
production of aromatic chemicals by microbial systems. Cell growth will be
quantified as a
function of respiration by the reduction of soluble tetrazolium salts. XTT
(2,3-Bis(2-methoxy-
- 10 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
4-nitro-5-sulfophenyI)-2H-tetrazolium-5-carboxanilide inner salt, Sigma) is
reduced to a
soluble purple formazan compound by respiring cells. The formazan product will
be
detected and quantified by absorbance at 450nm.
[0045] Strain tolerance testing 135 on soluble lignin can be done in liquid
format in 48 well
plates, for example. Each strain can be tested in replicates of 8, for
example, and E. coli
can be used as a negative control strain. Strains can first be grown in rich
medium to
saturation, washed, and OD600nm of the cultures determined. Equal numbers of
bacteria
can be inoculated into wells of the 48-well growth plate containing minimal
medium
excluding a carbon source. Increasing concentrations of soluble lignin
fractions, in addition
to a minus-lignin positive control, can be added to the wells containing each
species to a
final volume of 0.8m1. A benzoic acid content analysis of the lignin fractions
can be used as
an internal indicator of the phenolic content of lignin wastes of different
origin. Following
incubation for 24-48 hours with shaking at 30 C, the cultures can be tested
for growth upon
exposure to the lignin fraction using an XTT assay kit. Culture samples can be
removed
from the 48 well growth plate and diluted appropriately in 96 well assay
plates to which the
XTT reagent can be be added. The soluble formazan produced will be quantified
by
absorbance at 450nm. Bacterial strains exhibiting the highest level of growth,
and therefore
tolerance, can be candidates for further development as host strains for
lignin conversions.
[0046] The strain demonstrated to have the best tolerance characteristics can
be transformed
with the beta-etherase gene identified as showing the highest biochemical
activity.
Restriction digestion can be used to excise the ORF fragment from the cloning
vector, and
the sequence can be sub-cloned into the shuttle vector pMMB206. Constructs
cloned in the
shuttle vector can be transformed into Azotobacter or Pseudomonas strains by
electroporation, or chemical transformation. The recombinant, lignin tolerant
host strain can
be re-tested for beta-etherase expression and activity using any methods known
to one of
skill, such as those described herein, adapted to the particular host strain
being used.
Feedstock from biorefinery processes
[0047] An example of a starting material might be pretreated lignocellulosic
biomass. In some
embodiments, the lignocellulose biomass material might include grasses, corn
stover, rice
hull, agricultural residues, softwoods and hardwoods. In some embodiments, the
lignin-
derived compounds might be derived from hardwood species such as poplar from
the Upper
Peninsula region of Michigan, or hardwoods such as poplar, lolloby pine, and
eucalyptus
-11-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
from Virginia and Georgia areas, or mixed hardwoods including maple and oak
species from
upstate New York.
[0048] In some embodiments, the pretreatment methods might encompass a range
of physical,
chemical and biological based processes. Examples of pretreatment methods used
to
generate the feedstock for Aligna processes might include physical
pretreatment, solvent
fractionation, chemical pretreatment, biological pretreatment, ionic liquids
pretreatment,
supercritical fluids pretreatment, or a combination thereof, for example,
which can be
applied in stages.
[0049] Physical pretreatment methods used to reduce the lignocellulose biomass
particle size
reduction might utilize mechanical stress methods of dry, wet vibratory and
compression-
based ball milling procedures. Solvent fractionation methods include
organosolve
processes, phosphoric acid fractionation processes, and methods using ionic
liquids to
pretreat the lignocellulose biomass to differentially solubilize and partition
various
components of the biomass. In some embodiments, organosolve methods might be
performed using alcohol, including ethanol, with an acid catalyst at
temperature ranges from
about 90 to about 20 C, and from about 155 to about 220 C with residence time
of about 25
minutes to about 100 minutes. Catalyst concentrations can vary from about
0.83% to about
1.67% and alcohol concentrations can vary from about 25% to about 74% (v/v).
In some
embodiments, phosphoric acid fractionations of lignocellulose biomass might be
performed
using a series of different extractions using phosphoric acid, acetone, and
water at
temperature of around 50 C. In some embodiments, ionic liquid pretreatment of
lignocellulose biomass might include use of ionic liquids containing anions
like chloride,
formate, acetate, or alkylphosphonate, with biomass:ionic liquids ratios of
approximately
1:10 (w/w). The pretreatment might be performed at temperatures ranging from
about
100 C to about 150 C. Other ionic liquid compounds that might be used include
1-butyl-3-
methyl-imidazolium chloride and 1-ethyl-3-methylimidazolium chloride.
[0050] Chemical pretreatments of lignocellulose biomass material might be
performed using
technologies that include acidic, alkaline and oxidative treatments. In some
embodiments,
acidic pretreatment methods of lignocellulose biomass such as those described
below might
be applied. Dilute acid pretreatments using sulfuric acid at concentrations in
the
approximate range of about 0.05% to about 5%, and temperatures in the range of
about
160 C to about 220 C. Steam explosion, with or without the use of catalysts
such as sulfuric
acid, nitric acid, carbonic acid, succinic acid, fumaric acid, maleic acid,
citric acid, sulfur
- 12-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
dioxide, sodium hydroxide, ammonia, before steam explosion, at temperatures
between
about 160 C to about 290 C. Liquid hot water treatment at pressure >5MPa at
temperatures
ranging from about 160 C to about 230 C, and pH range between about 4 and
about 7.
And, in some embodiments, alkaline pretreatment methods using catalysts such
as calcium
oxide, ammonia, and sodium hydroxide might be used. The ammonia fiber
expansion
(AFEX) method might be applied in which concentrated ammonia at about 0.3kg to
about
2kg of ammonia per kg of dry weight biomass is used at about 60 C to about 140
C in a high
pressure reactor, and cooked for 5-45 minutes before rapid pressure release.
The ammonia
recycle percolation (ARP) method might be used in flow through mode by
percolating
ammoniacal solutions at 5-15% concentrations at high temperatures and
pressures.
Oxidative pretreatment methods such as alkaline wet oxidation might be used
with sodium
carbonate at a temperature ranging from about 170 C to about 220 C in a high
pressure
reactor using pressurized air/oxygen mixtures or hydrogen peroxide as the
oxidants.
[0051] Biological pretreatment methods using white rot basidomycetes and
certain
actinomycetes might be applied. One type of product stream from such
pretreatment
methods might be soluble lignin, and might contain lignin-derived monomers and
oligomers
in the range of about 1g/L to about 10g/L, and xylans. The lignin-derived
monomers might
include compounds such as gallic acid, hydroxybenzoate, ferulic acid,
hydroxymethyl
furfural, hydroxymethyl furfural alcohol, vanillin, homovanillin, syringic
acid, syringaldehyde,
and furfural alcohol.
[0052] Supercritical fluid pretreatment methods might be used to process the
biomass.
Examples of supercritical fluids for use in processing biomass include
ethanol, acetone,
water, and carbon dioxide at a temperature and pressures above the critical
points for
ethanol and carbon dioxide but at a temperature and/or pressure below that of
the critical
point for water.
[0053] Combinations of steam pretreatment and biological pretreatment methods
might be
applied. For example, a biomass steam can be pretreated at 195 C for 10 min at
controlled
pH, followed by enzymatic treatment using commercial cellulases and xylanases
at dosings
of 100mg protein/g total solid, and with incubation at 50 C at pH 5.0 with
agitation of 500
rpm.
[0054] In some embodiments, combinations of hydrothermal, organosolve, and
biological
pretreatment methods might be used. One example of such a combination is a 3
stage
process:
- 13-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
Stage 1. Use heat in an aqueous medium at a predetermined pH, temperature and
pressure for the hydrothermal process;
Stage 2. Use at least one organic solvent from those described in 6-6c in
water for the
organosolve step;
Stage 3. Use yeast, white rot basidomycetes, actinomycetes, and cellulases and
xylanases
in native or recombinant forms for the biological pretreatment step.
[0055] Soluble lignin fractions derived using organosolve methods might
produce soluble lignins
in the molecular weight range of 188-1000, soluble in various polar solvents.
Without
intending to be bound by any theory or mechanism of action, organosolve
processes are
generally believed to maintain the lignin beta-aryl ether linkage.
[0056] Lignin streams from steam exploded lignocellulosic biomass might be
used. Steam
explosion might be performed, for example, using high pressure steam in the
range of about
200 psi to about 500psi, and at temperatures ranging from about 180 C to about
230 C for
about 1 minute to about 20 minutes in batch or continuous reactors. The lignin
might be
extracted from the steam-exploded material with alkali washing or extraction
using organic
solvents. Steam exploded lignins can exhibit properties similar to those
described form
organosolve lignins, retaining native bond structures and containing about 3
to about 12
aromatic units per oligomer unit.
[0057] Supercritical fluid pretreatment can produce soluble lignin fractions
that can be used with
the teachings provided herein. Such processes typically yield monomers and
lignin
oligomers having a molecular weight of about <1000 Da!tons.
[0058] Biological pretreatment can produce soluble lignin fractions that can
be used with the
teachings provided herein. Such lignin streams might contain lignin monomers
and
oligomers in the range of about 1 g/L to about 10 g/L and have a molecular
weight of about
<1000 Da!tons, and xylans. The lignin-derived monomers might include compounds
such
as gallic acid, hydroxybenzoate, ferulic acid, hydroxymethyl furfural,
hydroxymethyl furfural
alcohol, vanillin, homovanillin, syringic acid, syringaldehyde, and furfural
alcohol.
Feedstock from wood pulping processes
[0059] Wood pulping processes produce a variety of lignin types, the type of
lignin dependent
on the type of process used. Chemical pulping processes include, for example,
Kraft and
sulfite pulping.
- 14-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[0060] In some embodiments, the lignin-derived compound can be derived from a
spent pulping
liquor or "black liquor" from Kraft pulping processes. Kraft lignin might be
derived from batch
or continuous processes using, for example, reaction temperatures in the range
of about
150 C to about 200 C and reaction times of approximately 2 hours. Any range of
molecular
weights of lignin may be obtained, and the useful fraction may range, in some
embodiments,
from about 200 Da!tons to about 4000 Da!tons. A Kraft lignin having a
molecular weight
ranging from about 1000 Da!tons to about 3000 Da!tons might be used in a
bioconversion.
[0061] In some embodiments, lignin from a sulfite pulping process might be
used. A sulfite
pulping process can include, for example, a chemical sulfonation using aqueous
sulfur
dioxide, bisulfite and monosulfite at a pH ranging from about 2 to about 12.
The sulfonated
lignin might be recovered by precipitation with excess lime as
lignosulfonates. Alternatively,
formaldehyde-based methylation of the lignin aromatics followed by sulfonation
might be
performed. Any range of molecular weights of lignin may be obtained, and the
useful
fraction may range, in some embodiments, from about 200 Da!tons to about 4000
Da!tons.
A sulfite lignin having a molecular weight ranging from about 1000 Da!tons to
about 3000
Da!tons might be used in a bioconversion.
Characterization of lignin-derived compounds for use in bioconversion
[0062] Optimization of a system for a particular feedstock should include an
understanding of
the composition of the particular feedstock. For example, one of skill will
appreciate that the
composition of a native lignin can be significantly different than the
composition of the lignin-
derived compounds in a given lignin faction that is used for a feedstock.
Accordingly, and
understanding of the composition of the feedstock will assist in optimizing
the conversion of
the lignin-derived compounds to the valuable aromatic compounds. Any method
known to
one of skill can be used to characterize the compositions of the feedstock.
For example,
one of skill may use wet chemistry techniques, such as thioacidolysis and
nitrobenzene
oxidation, coupled with gas chromatography, which have been used
traditionally, or
spectroscopic techniques such as NMR and FTIR. Thioacidolysis, for example,
cleaves the
13-0-4 linkages in lignin, giving rise to monomers and dimers which are then
used to
calculate the S and G content. Similar information can be obtained using
nitrobenzene
oxidation, but the ratios are thought to be less accurate. In some
embodiments, the content
of S, G, and H, as well as their relative ratios can be used to characterize
feedstock
compositions for purposes of determining a bioconversion system design.
-15-

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
[0063] It is widely accepted that the biosynthesis of lignin stems from the
polymerization of
three types of phenylpropane units, also referred to as monolignols. These
units are
coniferyl, sinapyl, and p-coumaryl alcohol. The three structures are as
follows:
HO .OH
p-coumaryl alcohol (H);
HO 0OH
0
coniferyl alcohol (G); and,
0
HO 0OH
0
sinapyl alcohol (S).
[0064] Tables 1A and 1B summarize distributions of p-coumaryl alcohol or p-
hydroxyl phenol
(H), coniferyl alcohol or guaiacyl (G), and sinapyl alcohol or syringyl (S)
lignin in several
sources of biomass. Table 1A compares percent lignin in the biomass to the
G:S:H.
-16-

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
[0065] Table 1A.
.
''...a.iin G S . H
zt, . .
Wheat Straw 16-21 45 46 9
Rice Straw 6 45 40 15
Rye Straw 18 43 53 1
Hemp 8-13 51 40 9
Tall Fescue:
Stems 7-10 55 42 -
1
Intemodes 11 48 50 ,
Flax 21-34 67 29 4
Jute 15-26 36 62 /
Sisal 7-14 22 76 ':t
Curaua Leaf fiber 7 29 41 30
Banana Plant Leaf 41 50 7
Piassav a Fiber
(Plam Tree) 45 4.0 9 51
Abaca 7-9 19 55 26
Loblolly Pine /9 86 2 12
29 87 0 13
Compression 60 40
Spruce (Picea Abies) 28 94 1 5
MWL 98 I 0
_
Eucalyptus i-dobus 22 14 84 2
Eucalyptus erandis 27 õ,./ 69 4
Birch pen du la 1/ 19 69 1
Beech 26 56 40 4
Acacia 18 48 49 1 -
...
-17-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
Table 1A compares location of a sample in the biomass, species, and
environmental stress to
the G:S:H.
Table 1B.
White Birch Cl S
* Fiber, $2 layer 12 88
* Vemelõ 52 Layer 8$1.
* Ray p:arenchytna, &layer 49 51
= Middic 91
* Middle 1ameila (1 20
* Middle lamella (1i het 4'4y) 100
* Mid.dklamella (raY/14h $$ 12
0. S.
Lignin Sarnples- -
Carpinus betulus MWL 19 1
Every:pi-14 etlehilVia MWL 35: 6
Barnhivio .M WL
Pam* wht krali nm
72, 3
Eucalyptlis kraft iign in 2.2 73 6
Lobolly Pine laymile
* Norm a! 95
gi Wind Oppo5ito 96 4
* Wind (omprosiOn 89 Ti
*: Bent (.)-mothto 96 4
= Bent COMMSSiari 8$ 12
[0066] In general, the relative amounts of G, S, and H in lignin can be a good
indicator of its
overall composition and response to a treatment, such as the bioconversions
taught herein.
In poplar species, for example, differences can be seen based on the
measurement
technique as well as species, but in general the SIG ratio ranges from 1.3 to
2.2. This is
similar to the hardwood eucalyptus, but higher than herbaceous biomass
switchgrass and
Miscanthus. This is to be expected given the higher H contents in grass
lignin. An
optimized nitrobenzene oxidation method has shown SIG ratios of 13 poplar
samples from
- 18-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
two different sites and obtained values ranging from 1.01 to 1.68. Further, a
linear
correlation (R2= 0.85) has been found in poplar between decreasing lignin
content and
increasing SIG ratios. The correlation was stronger (R2= 0.93) in samples from
a single site
suggesting a dependency on geographic location.
[0067] Higher throughput methods can be used for rapid screening of
feedstocks. Examples of
such methods can include, but are not limited to, near-infrared (NIR),
reflectance
spectroscopy, pyrolysis molecular beam mass spectrometry (pyMBMS), Fourier
transform
infrared spectroscopy, a modified thioacidolysis technique, and whole cell NMR
after
dissolution in ionic liquids. Information on some structural characteristics
of lignin, such as
SIG ratios, can be rapidly obtained using these methods. The average S:G:H
ratio of 104
poplar lignin samples, for example, was determined using the modified
thioacidolysis
technique, and was found to be 68:32:0.02. In some embodiments, the S, G, and
H
components in the ratio can be expressed as mass percent. In some embodiments,
the S,
G, and H components in the ratio can be expressed as any relative unit, or
unitless. Any
comparison can be used, if the amount of each component directly correlates
with the other
respective components in the composition. The ratios can be expressed in
relative whole
numbers or fractions as S:G:H, or any other order or combination of
components, SIG, G/S,
and the like. In some embodiments, the SIG ratio is used. In some embodiments,
the SIG
ratio can range from about 0.20 to about 20.0, from about 0.3 to about 18.0,
from about 0.4
to about 15.0, from about 0.5 to about 15.0, from about 0.6 to about 12.0,
from about 0.7 to
about 10.0, from about 0.8 to about 8.0, from about 0.9 to about 9.0, from
about 1.0 to about
7.0, or any range therein. In some embodiments, the SIG ratio can be about
0.2, about 0.4,
about 0.6, about 0.8, about 1.0, about 1.2, about 1.4, about 1.6, about 1.8,
about 2.0, about
2.2, about 2.4, about 2.6, about 2.8, about 3.0, about 3.2, about 3.4, about
3.6, about 3.8,
about 4.0, about 4.2, about 4.4, about 4.6, about 4.8, about 5.0, about 5.2,
about 5.4, about
5.6, about 5.8, about 6.0, about 6.2, about 6.4, about 6.6, about 6.8, about
7.0, about 7.2,
about 7.4, about 7.6, about 7.8, about 8.0, about 8.2, about 8.4, about 8.6,
about 8.8, about
9.0, about 9.2, about 9.4, about 9.6, about 9.8, about 10.0, and any ratio in-
between on 0.1
increments, and any range of ratios therein.
Fractionation of lignin-derived compounds for use in bioconversion
[0068] Soluble lignin streams derived from biorefinery or Kraft processes
might be used directly
in microbial conversions without additional purification or, they might be
further purified by
one or more of the separation or fractionation techniques prior to microbial
conversions.
- 19-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[0069] In some embodiments, membrane filtration might be applied to achieve a
starting
concentration of lignin monomers and oligomers in the 1-60% (w/v)
concentration range,
and molecular weights ranging from about 180 Da!tons to about 2000 Da!tons,
from about
200 Da!tons to about 4000 Da!tons, from about 250 Da!tons to about 2500
Da!tons, from
about 180 Da!tons to about 3500 Da!tons, from about 300 Da!tons to about 3000
Da!tons, or
any range therein.
[0070] In some embodiments, soluble lignin streams might be partially purified
by
chromatography using, for example, HP-20 resin. The lignin monomers and
oligomers can
bind to the resin while highly polar impurities or inorganics that might be
toxic to
microorganisms can remain un-bound. Subsequent elution, for example, with a
methanol-
water solvent system, can provide fractions of higher purity that are enriched
in lignin
monomers and oligomers.
Chemical products
[0071] A purpose of the present teaching includes the discovery of novel
biochemical
conversions that create valuable commercial products from various lignin core
structures.
Such commercial products include monomeric aromatic chemicals that can serve
as building
block chemicals. One of skill will appreciate that a vast number of aromatic
chemicals can
be produced using the principles provided by the teachings set-forth herein,
and that a
comprehensive teaching of every possible chemical that can be produced would
be beyond
the scope and purpose of this teaching.
[0072] FIGs. 2A and 2B illustrate (i) the structures of some building block
chemicals that can be
produced using bioconversions, and (ii) an example enzyme system from a
Sphingomonaas
paucimobilis gene cluster, according to some embodiments. FIG. 2A shows that
examples
of some monomeric aromatic structures that can serve as building block
chemicals derived
from lignin include, but are not limited to, guaiacol, 6-
hydroxypropiovanillone, 4-hydroxy-3
methoxy mandelic acid, coniferaldehyde, ferulic acid, eugenol, propylguaicol,
and 4-
acetylguaiacol. It should be appreciated that each of these structures can be
produced
using the teachings provided herein. FIG. 2B(i) shows the organization of the
LigDFEG
gene cluster in a Sphingomonaas paucimobilis strain. FIG. 2B(ii) shows deduced
functions
of the gene products believed to be involved in a 6-aryl ether bond cleavage
in a model
lignin structure, guaiacylglycerol-6-guaiacyl ether (GGE). The vertical bars
above the
restriction map indicate the positions of the gene insertions of LigD, LigF,
LigE, and LigG.
- 20 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
LigD shoed Ca-dehydrogenase activity, LigF and LigE showed 6-etherase
activity, and LigG
showed glutathione lyase activity. FIG. 2 LEGEND (Abbreviations): restriction
enzymes Ap
(Apal), Bs (BstXI), E (EcoRI), Ec (Eco4711I), MI (Mlul), P (Pstl), RV (EcoRV),
S, (Sall), Sc
(Sac!). Scl I (Sad!), St (Stu I), Sm (Smal), Tt (TthIIII), and X (Xhol);
chemicals GGE
(guaiacylglycerol-6-guaiacyl ether), GSH (glutathione), GSSG (glutathione
disulfide), and
asterisks are asymmetric carbons.
[0073] Commercial products that can be obtained from a bioconversion of lignin-
derived
compounds, as taught herein, include mono-aromatic chemicals. Examples of such
chemicals include, but are not limited to, caprolactam, cumene, styrene,
mononitro- and
dinitrotoluenes and their derivatives, 2,4-diaminotoluene, 2,4-dinitrotoluene,
terephthalic
acid, catechol, vanillin, salicylic acid, aminosalicylic acid, cresol and
isomers, alkylphenols,
chlorinated phenols, nitrophenols, polyhydric phenols, nitrobenzene, aniline
and secondary
and tertiary aniline bases, benzothiazole and derivatives, alkylbenzene and
alkylbenzene
sulfonates, 4,4-diphenylmethane diisocyanate (MDI), chlorobenzenes and
dichlorobenzenes, nitrochlorobenzenes, sulfonic acid derivatives of toluene,
pseudocumene,
mesitylene, nitrocumene, cumenesulfonic acid.
Enzyme discovery
[0074] The teachings herein are also directed to the discovery of novel
enzymes. In some
embodiments, the enzymes are beta-etherase enzymes.
[0075] Lignin is the only plant biomass constituent based on aromatic core
structures, and is
comprised of branched phenylpropenyl (C9) units. The guaiacol and syringol
building
blocks of lignin are linked through carbon-carbon (C-C) and carbon-oxygen (C-
0, ether)
bonds. The native structure of lignin suggests its key application as a
chemical feedstock
for aromatic chemicals. The production of such chemical structures
necessitates
depolymerization and rupture of C-C and C-0 bonds. An abundant chemical
linkage in
lignin is the beta-aryl ether linkage, which comprises 50% to 70% of the bond
type in lignin.
The efficient scission of the beta-aryl ether bond would generate the
monomeric building
blocks of lignin, and provide the chemical feedstock for subsequent conversion
to a range of
industrial products.
[0076] The beta-etherase enzyme system has multiple advantages for conversions
of lignin
oligomers to monomers over the laccase enzyme systems. The beta-etherase
enzyme
system would achieve highly selective reductive bond scission catalysis for
efficient and
- 21 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
high yield conversions of lignin oligomers to monomers without the formation
of side
products, degradation of the aromatic core structures of lignin, or the use of
electron transfer
mediators required with use of the oxidative and radical chemistry-based
laccase enzyme
systems.
[0077] FIG. 3 is an example of a beta-etherase catalyzed hydrolysis of a model
lignin dimer, a-
0-(B-methylumbelliferyl) acetovanillone (MUAV), according to some embodiments.
The
scission of the beta-aryl ether bond in model compounds of lignin by beta-
etherases from
the microbe Sphingmonas paucimobilis has been described. However, the
available
information is limited, and there is no precedent in the literature for the
use of S.
paucimobilis as an industrial microbe for commercial scale processes. The
discovery of new
beta-etherase enzymes, and the heterologous expression of these new enzymes in
Azotobacter strains will provide the art with valuable industrial strains that
particulary well-
suited for lignin conversion processes.
[0078] One of skill will recognize the chemical nomenclature used herein as
standard to the art.
For example, the amino acids used herein can be identified by at least the
following
conventional three-letter abbreviations in Table 2:
[0079] Table 2.
Alanine A Ala Leucine L Leu
Arginine R Arg Lysine K Lys
Asparagine N Asn Methionine M Met
Aspartic acid D Asp Phenylalanine F Phe
Cysteine C Cys Proline P Pro
Glutamic acid E Glu Serine S Ser
Glutamine Q Gln Threonine T Thr
Glycine G Gly Tryptophan W Trp
Histidine H His Tyrosine Y Tyr
lsoleucine I Ile Valine V Val
Ornithine 0 Orn Other Xaa
[0080] The single letter identifier is provided for ease of reference, but any
format can be used.
The three-letter abbreviations are generally accepted in the peptide art,
recommended by
the IUPAC-IUB commission in biochemical nomenclature, and are provided to
comply with
WIPO Standard ST.25. Furthermore, the peptide sequences are taught according
to the
generally accepted convention of placing the N-terminus on the left and the C-
terminus on
the right of the sequence listing to again comply with WIPO Standard ST.25.
The Recombinant Polypeptides
- 22 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[0081] The teachings herein are based on discovery of novel and non-obvious
proteins, DNAs,
and host cell systems that can function in the conversion of lignin-derived
compounds into
valuable aromatic compounds. The systems can include natural, wild-type
components or
recombinant components, the recombinant components being isolatable from what
occurs in
nature.
[0082] The term "isolated" means altered "by the hand of man" from its natural
state; i.e., if it
occurs in nature, it has been changed or removed from its original
environment, or both. For
example, a naturally occurring polynucleotide or a polypeptide naturally
present in a living
animal in its natural state is not "isolated," but the same polynucleotide or
polypeptide
separated from the coexisting materials of its natural state is "isolated", as
the term is used
herein. For example, with respect to polynucleotides, the term isolated means
that it is
separated from the chromosome and cell in which it naturally occurs. However,
a nucleic
acid molecule contained in a clone that is a member of a mixed clone library
(e.g., a
genomic or cDNA library) and that has not been isolated from other clones of
the library
(e.g., in the form of a homogeneous solution containing the clone without
other members of
the library) or a chromosome isolated or removed from a cell or a cell lysate
(e.g., a
"chromosome spread", as in a karyotype), is not "isolated" for the purposes of
the teachings
herein. Moreover, a lone nucleic acid molecule contained in a preparation of
mechanically
or enzymatically cleaved genomic DNA, where the isolation of the nucleic
molecule was not
the goal, is also not "isolated" for the purposes of the teachings herein. As
part of, or
following, an intentional isolation, polynucleotides can be joined to other
polynucleotides, for
mutagenesis, to form fusion proteins, and for propagation or expression in a
host, for
instance. Isolated polynucleotides, alone or joined to other polynucleotides
such as vectors,
can be introduced into host cells, in culture or in whole organisms, after
which such DNAs
still would be isolated, as the term is used herein, because they would not be
in their
naturally occurring form or environment. Similarly, the isolated
polynucleotides and
polypeptides may occur in a composition, such as a media formulation,
solutions for
introduction of polynucleotides or polypeptides, for example, into cells,
compositions or
solutions for chemical or enzymatic reactions, for instance, which are not
naturally occurring
compositions, and, therein remain "isolated" polynucleotides or polypeptides
within the
meaning of that term as it is used herein.
[0083] A "vector," such as an expression vector, is used to transfer or
transmit the DNA of
interest into a prokaryotic or eukaryotic host cell, such as a bacteria,
yeast, or a higher
- 23 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
eukaryotic cell. Vectors can be recombinantly designed to contain a
polynucleotide
encoding a desired polypeptide. These vectors can include a tag, a cleavage
site, or a
combination of these elements to facilitate, for example, the process of
producing, isolating,
and purifying a polypeptide. The DNA of interest can be inserted as the
expression
component of a vector. Examples of vectors include plasmids, cosmids, viruses,
and
bacteriophages. If the vector is a virus or bacteriophage, the term vector can
include the
viral/bacteriophage coat. The term "expression vector" is usually used to
describe a DNA
construct containing gene encoding an expression product of interest, usually
a protein, that
is expressed by the machinery of the host cell. This type of vector is
frequently a plasmid,
but the other forms of expression vectors, such as bacteriophage vectors and
viral vectors
(e.g., adenoviruses, replication defective retroviruses, and adeno-associated
viruses), can
be used.
[0084] In some embodiments, the polypeptides taught herein can be natural or
wildtype,
isolated and/or recombinant. In some embodiments, the polynucleotides can be
natural or
wildtype, isolated and/or recombinant. In some embodiments, the teachings are
directed to
a vector than can include such a polynucleotide or a host cell transformed by
such a vector.
[0085] In some embodiments, the polypeptide can be an isolated recombinant
polypeptide,
comprising an amino acid sequence having at least 95% identity to SEQ ID
NO:101. The
sequence can conserve residues T19,120, S21, P22, V24, W25, T27, K28, Y29,
A30, H33,
K34, G35, F36, D39,140, V41, P42, G43, G44, F45, G47,148, E50, R51, T52, G53,
G54,
K100, A101, N104, V111, G112, M115, F116, P166, W107, Y184, Y187, R188, G191,
G192, and F195.
[0086] In some embodiments, the polypeptide can be an isolated recombinant
polypeptide,
comprising SEQ ID NO:101; or conservative substitutions thereof outside of the
conserved
residues. The conserved residues can include T19,120, S21, P22, V24, W25, T27,
K28,
Y29, A30, H33, K34, G35, F36, D39, 140, V41, P42, G43, G44, F45, G47,148, E50,
R51,
T52, G53, G54; K100, A101, N104, V111, G112, M115, F116, P166, W107, Y184,
Y187,
R188, G191, G192, and F195.
[0087] In some embodiments, the polypeptide can be an isolated recombinant
glutathione S-
transferase enzyme, comprising an amino acid sequence having at least 95%
identity to
SEQ ID NO:101. The amino acid sequence can conserve residues T19, 120, S21,
P22,
V24, W25, T27, K28, Y29, A30, H33, K34, G35, F36, D39,140, V41, P42, G43, G44,
F45,
G47,148, E50, R51, T52, G53, G54; K100, A101, N104, V111, G112, M115, F116,
P166,
- 24 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
W107, Y184, Y187, R188, G191, G192, and F195; wherein, the amino acid sequence
functions to cleave a beta-aryl ether.
[0088] In some embodiments, the polypeptide can be an isolated recombinant
glutathione S-
transferase enzyme, comprising an amino acid sequence having at least 95%
identity to
SEQ ID NO:101; wherein, the amino acid sequence functions to cleave a beta-
aryl ether.
[0089] In some embodiments, the polypeptide can be an isolated recombinant
polypeptide,
comprising (i) a length ranging from about 279 to about 281 amino acids; (ii)
a first amino
acid region consisting of residues 19-54 from SEQ ID NO:101, or conservative
substitutions
thereof outside of conserved residues T19, 120, S21, P22, V24, W25, T27, K28,
Y29, A30,
H33, K34, G35, F36, D39,140, V41, P42, G43, G44, F45, G47, 148, E50, R51, T52,
G53,
and G54; wherein, the first amino acid region can be located in the
recombinant polypeptide
from about residue 14 to about residue 59; and, (iii) a second amino acid
region consisting
of residues 98-221 from SEQ ID NO:101, or conservative substitutions thereof
outside of
conserved residues K100, A101, N104, V111, G112, M115, F116, P166, W107, Y184,
Y187, R188, G191, G192, and F195; wherein, the second amino acid region is
located in
the recombinant polypeptide from about residue 93 to about residue 226.
[0090] In some embodiments, the polypeptide can be an isolated recombinant
glutathione S-
transferase enzyme, comprising (i) a length ranging from about 279 to about
281 amino
acids; (ii) a first amino acid region having at least 95% identity to residues
19-54 from SEQ
ID NO:101 while conserving residues T19,120, S21, P22, V24, W25, T27, K28,
Y29, A30,
H33, K34, G35, F36, D39,140, V41, P42, G43, G44, F45, G47, 148, E50, R51, T52,
G53,
and G54; wherein, the first amino acid region is located in the recombinant
polypeptide from
about residue 14 to about residue 59; and, (iii) a second amino acid region
having at least
95% identity to residues 98-221 from SEQ ID NO:101 while conserving residues
K100,
A101, N104, V111, G112, M115, F116, P166, W107, Y184, Y187, R188, G191, G192,
and
F195; wherein, the second amino acid region can be located in the recombinant
polypeptide
from about residue 93 to about residue 226; and, the recombinant glutathione S-
transferase
enzyme can function to cleave a beta-aryl ether.
[0091] In some embodiments, the polypeptide can be an isolated recombinant
glutathione S-
transferase enzyme, comprising an amino acid sequence having at least 95%
identity to
SEQ ID NO:541; wherein, the amino acid sequence functions to cleave a beta-
aryl ether.
- 25 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[0092] In some embodiments, the polypeptide can be an isolated recombinant
polypeptide,
comprising (i) a length ranging from about 256 to about 260 amino acids; (ii)
a first amino
acid region consisting of residues 47-57 from SEQ ID NO:541, or conservative
substitutions
thereof outside of conserved residues A47,148, N49, P50, G52, V54, P55, V56,
L57;
wherein, the first amino acid region is located in the recombinant polypeptide
from about
residue 45 to about residue 57; (iii) a second amino acid region consisting of
63-76 from
SEQ ID NO:541; and, (iv) a third amino acid region consisting of residues 99-
230 from SEQ
ID NO:541, or conservative substitutions thereof outside of conserved residues
R100, Y101,
K104, D107, M111, N112, S115, M116, K176, L194,1197, N198, S201, H202, and
M206;
wherein, the second amino acid region is located in the recombinant
polypeptide from about
residue 94 to about residue 235.
[0093] In some embodiments, the polypeptide can be an isolated recombinant
glutathione S-
transferase enzyme, comprising (i) a length ranging from about 279 to about
281 amino
acids; (ii) a first amino acid region having at least 95% identity to 47-57
from SEQ ID
NO:541, or conservative substitutions thereof outside of conserved residues
A47, 148, N49,
P50, G52, V54, P55, V56, L57; wherein, the first amino acid region can be
located in the
recombinant polypeptide from about residue 45 to about residue 57; (iii) a
second amino
acid region consisting of 63-76 from SEQ ID NO:541; and, (iv) a third amino
acid region
having at least 95% identity to residues 99-230 from SEQ ID NO:541, or
conservative
substitutions thereof outside of conserved residues R100, Y101, K104, D107,
M111, N112,
S115, M116, K176, L194,1197, N198, S201, H202, and M206; wherein, the second
amino
acid region can be located in the recombinant polypeptide from about residue
94 to about
residue 235; wherein, the recombinant glutathione S-transferase enzyme
functions to cleave
a beta-aryl ether.
[0094] In some embodiments, an amino acid substitution outside of the
conserved residues can
be a conservative substitution. And, in many embodiments, the amino acid
sequence can
function to cleave a beta-aryl ether.
Methods of Preparing the Recombinant SDF-1 Polynucleotide and Polypeptides
[0095] The teachings include a method of preparing the polypeptides described
herein,
comprising culturing a host cell under conditions suitable to produce the
desired
polypeptide; and recovering the polypeptide from the host cell culture;
wherein, the host cell
comprises an exogenously-derived polynucleotide encoding the desired
polypeptide. In
- 26 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
some embodiments, the host cell is E. Coli. In some embodiments, the host cell
can be an
Azotobacter strain such as, for example, Azotobacter vinelandii.
[0096] Initially, a double-stranded DNA fragment encoding the primary amino
acid sequence of
recombinant polypeptide can be designed. This DNA fragment can be manipulated
to
facilitate synthesis, cloning, expression or biochemical manipulation of the
expression
products. The synthetic gene can be ligated to a suitable cloning vector and
then the
nucleotide sequence of the cloned gene can be determined and confirmed. The
gene can
be then amplified using designed primers having specific restriction enzyme
sequences
introduced at both sides of insert gene, and the gene can be subcloned into a
suitable
subclone/expression vector. The expression vector bearing the synthetic gene
for the
mutant can be inserted into a suitable expression host. Thereafter the
expression host can
be maintained under conditions suitable for production of the gene product
and, in some
embodiments, the protein can be (i) isolated and purified from the cells
expressing the gene
or (ii) used directly in a reaction environment that includes the host cell.
[0097] The nucleic acid (e.g., cDNA or genomic DNA) may be inserted into a
replicable vector
for cloning (amplification of the DNA) for expression. Various vectors are
publicly available.
In general, DNA can be inserted into an appropriate restriction endonuclease
site(s) using
techniques known in the art, for example. Vector components generally include,
but are not
limited to, one or more of a signal sequence, an origin of replication, one or
more marker
genes, an enhancer element, a promoter, and a transcription termination
sequence.
[0098] The signal sequence may be a prokaryotic signal sequence selected, for
example, from
the group of the alkaline phosphatase, penicillinase, Ipp, or heat-stable
enterotoxin II
leaders. For yeast secretion the signal sequence may be, e.g., the yeast
invertase leader,
alpha factor leader (including Saccharomyces and Kluyveromyces alpha-factor
leaders, the
latter described in U.S. Pat. No. 5,010,182), or acid phosphatase leader, the
C. albicans
glucoamylase leader (EP 362,179), or the signal described in WO 90/13646, for
example.
In mammalian cell expression, mammalian signal sequences may be used to direct
secretion of the protein, such as signal sequences from secreted polypeptides
of the same
or related species, as well as viral secretory leaders.
[0099] Both expression and cloning vectors contain a nucleic acid sequence
that enables the
vector to replicate in one or more selected host cells. Such sequences are
well known for a
variety of bacteria, yeast, and viruses. The origin of replication from a
plasmid, e.g.
pBR322, for example, is suitable for most Gram-negative bacteria, and the 2 p
plasmid
- 27 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
origin is suitable for yeast, and various viral origins (SV40, polyoma,
adenovirus, VSV or
BPV) are useful for cloning vectors in mammalian cells.
[00100] Expression and cloning vectors will typically contain a selection
gene, also
termed a selectable marker. Typical selection genes encode proteins that (a)
confer
resistance to antibiotics or other toxins, e g., ampicillin, neomycin,
methotrexate, or
tetracycline, (b) complement auxotrophic deficiencies, or (c) supply critical
nutrients not
available from complex media, e.g., the gene encoding D-alanine racemase for
Bacilli.
[00101] An example of suitable selectable markers for mammalian cells are
those that
enable the identification of cells competent to take the encoding nucleic
acid, such as DHFR
or thymidine kinase. An appropriate host cell when wild-type DHFR is employed
is the CHO
cell line deficient in DHFR activity, prepared and propagated as described by
Urlaub et al.,
Proc. Natl. Acad. Sci. USA, 77:4216 (1980). A suitable selection gene for use
in yeast is the
trp1 gene present in the yeast plasmid YRp7 (Stinchcomb et al., Nature, 282:39
(1979);
Kingsman et al., Gene, 7:141 (1979); Tschemper et al., Gene, 10:157 (1980)).
The trpl
gene provides a selection marker for a mutant strain of yeast lacking the
ability to grow in
tryptophan, for example, ATCC No. 44076 or PEP4-1 (Jones, Genetics, 85:12
(1977)).
[00102] Expression and cloning vectors usually contain a promoter operably
linked to the
encoding nucleic acid sequence to direct mRNA synthesis. Promoters recognized
by a
variety of potential host cells are well known. Promoters suitable for use
with prokaryotic
hosts include the .beta.-lactamase and lactose promoter systems (Chang et al.,
Nature,
275:615 (1978); Goeddel et al., Nature, 281:544 (1979)), alkaline phosphatase,
a tryptophan
(trp) promoter system (Goeddel, Nucleic Acids Res., 8:4057 (1980); EP 36,776),
and hybrid
promoters such as the tac promoter (deBoer et al., Proc. Natl. Acad. Sci. USA,
80:21 25
(1983)). Promoters for use in bacterial systems also will contain a Shine-
Dalgarno
sequence operably linked to the encoding DNA.
[00103] Other yeast promoters, which are inducible promoters having the
additional
advantage of transcription controlled by growth conditions, are the promoter
regions for
alcohol dehydrogenase 2, isocytochrome C, acid phosphatase, degradative
enzymes
associated with nitrogen metabolism, metallothionein, glyceraldehyde-3-
phosphate
dehydrogenase, and enzymes responsible for maltose and galactose utilization.
Suitable
vectors and promoters for use in yeast expression are known in the art, e.g.
see EP 73,657
for a further discussion.
- 28 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00104] PR087299 transcription from vectors in mammalian host cells is
controlled, for
example, by promoters obtained from the genomes of viruses such as polyoma
virus,
fowlpox virus (UK 2,211,504), adenovirus (such as Adenovirus 2), bovine
papilloma virus,
avian sarcoma virus, cytomegalovirus, a retrovirus, hepatitis-B virus and
Simian Virus 40
(5V40), from heterologous mammalian promoters, e.g., the actin promoter or an
immunoglobulin promoter, and from heat-shock promoters, provided such
promoters are
compatible with the host cell systems.
[00105] Transcription of the encoding DNA by higher eukaryotes may be
increased by
inserting an enhancer sequence into the vector. Enhancers are cis-acting
elements of DNA,
usually about from 10 to 300 bp, that act on a promoter to increase its
transcription. Many
enhancer sequences are now known from mammalian genes (globin, elastase,
albumin, a-
fetoprotein, and insulin). Typically, however, one will use an enhancer from a
eukaryotic cell
virus. Examples include the 5V40 enhancer on the late side of the replication
origin, the
cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side
of the
replication origin, and adenovirus enhancers. The enhancer may be spliced into
the vector
at a position 5' or 3' to the coding sequence but is preferably located at a
site 5' from the
promoter.
[00106] Expression vectors used in eukaryotic host cells (yeast, fungi,
insect, plant,
animal, human, or nucleated cells from other multicellular organisms) will
also contain
sequences necessary for the termination of transcription and for stabilizing
the mRNA.
Such sequences are commonly available from the 5' and, occasionally 3',
untranslated
regions of eukaryotic or viral DNAs or cDNAs. These regions contain nucleotide
segments
transcribed as polyadenylated fragments in the untranslated portion of the
mRNA encoding
the mutants.
[00107] In some embodiments, the expression control sequence can be
selected from a
group consisting of a lac system, T7 expression system, major operator and
promoter
regions of pBR322 origin, and other prokaryotic control regions. Still other
methods,
vectors, and host cells suitable for adaptation to the synthesis of the
mutants in recombinant
vertebrate cell culture are described in Gething et al., Nature, 293:620 625
(1981); Mantei et
al., Nature, 281:40 46 (1979); EP 117,060; and EP 117,058.
[00108] Mutants can be expressed as a fusion protein. In some embodiments,
the
methods involve adding a number of amino acids to the protein, and in some
embodiments,
to the amino terminus of the protein. Extra amino acids can serve as affinity
tags or
- 29 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
cleavage sites, for example. Fusion proteins can be designed to: (1) assist in
purification
by acting as a temporary ligand for affinity purification, (2) produce a
precise recombinant by
removing extra amino acids using a cleavage site between the target gene and
affinity tag,
(3) increase the solubility of the product, and/or (4) increase expression of
the product. A
proteolytic cleavage site can be included at the junction of the fusion region
and the protein
of interest to enable further purification of the product ¨ separation of the
recombinant
protein from the fusion protein following affinity purification of the fusion
protein. Such
enzymes, and their cognate recognition sequences, can include Factor Xa,
thrombin and
enterokinase, cyanogen bromide, trypsin, or chymotrypsin, for example. Typical
fusion
expression vectors include pGEX (Pharmacia Biotech Inc; Smith, D. B. and
Johnson, K. S.
Gene 67:31-40 (1988)), pMAL (New England Biolabs, Beverly, Mass.), pRIT5
(Pharmacia,
Piscataway, N.J.), and pET (Strategen), which can fuse glutathione S-
transferase (GST),
maltose E binding protein, protein A, or a six-histidine sequence,
respectively, to a target
recombinant protein.
[00109] Synthetic DNAs containing the sequences of nucleotides, tags and
cleavage
sites can be designed and provided as a modified coding for recombinant
polypeptide
mutants. In some embodiments, a polypeptide can be a fusion polypeptide having
an
affinity tag, and the recovering step includes (1) capturing and purifying the
fusion
polypeptide, and (2) removing the affinity tag for high yield production of
the desired
polypeptide or an amino acid sequence that is at least 95% homologous to a
desired
polypeptide. DNA encoding the mutants may be obtained from a cDNA library
prepared
from tissue possessing the mRNA for the mutants. As such, the DNA can be
conveniently
obtained from a cDNA library. The encoding gene for the mutants may also be
obtained
from a genomic library or by known synthetic procedures (e.g., automated
nucleic acid
synthesis).
[00110] Libraries can be screened with probes designed to identify the gene
of interest or
the protein encoded by it. Screening the cDNA or genomic library with the
selected probe
may be conducted using standard hybridization procedures, such as described in
Sambrook
et al., Molecular Cloning: A Laboratory Manual (New York: Cold Spring Harbor
Laboratory
Press, 1989), which is herein incorporated by reference. An alternative means
to isolate the
gene encoding recombinant polypeptide mutants is to use PCR methodology
[Sambrook et
al., supra; Dieffenbach et al., PCR Primer: A Laboratory Manual (Cold Spring
Harbor
Laboratory Press, 1995)].
- 30 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[0 0 1 1 1] Nucleic acids having a desired protein coding sequence may be
obtained by
screening selected cDNA or genomic libraries using a deduced amino acid
sequence and, if
necessary, a conventional primer extension procedure as described in Sambrook
et al.,
supra, to detect precursors and processing intermediates of mRNA that may not
have been
reverse-transcribed into cDNA.
[00112] The selection of expression vectors, control sequences,
transformation methods,
and the like, are dependent on the type of host cell used to express the gene.
Following
entry into a cell, all or part of the vector DNA, including the insert DNA,
may be incorporated
into the host cell chromosome, or the vector may be maintained
extrachromosomally.
Those vectors that are maintained extrachromosomally are frequently capable of
autonomous replication in the host cell. Other vectors are integrated into the
genome of a
host cell upon and are replicated along with the host genome.
[00113] Host cells are transfected or transformed with the expression or
cloning vectors
described herein to produce the mutants. The cells are cultured in
conventional nutrient
media modified as appropriate for inducing promoters, selecting transformants,
or amplifying
the genes encoding the desired sequences. The culture conditions, such as
media,
temperature, pH and the like, can be selected by the skilled artisan without
undue
experimentation. In general, principles, protocols, and practical techniques
for maximizing
the productivity of cell cultures can be found in Mammalian Cell
Biotechnology: a Practical
Approach, M. Butler, ed. (IRL Press, 1991) and Sambrook et al., supra, each of
which are
incorporated by reference.
[00114] The host cells can be prokaryotic or eukaryotic and, suitable host
cells for cloning
or expressing the DNA in the vectors herein can include prokaryote, yeast, or
higher
eukaryote cells. Methods of eukaryotic cell transfection and prokaryotic cell
transformation
are known to the ordinarily skilled artisan, for example, CaCl2, CaPO4,
liposome-mediated
and electroporation. Depending on the host cell used, transformation is
performed using
standard techniques appropriate to such cells. The calcium treatment employing
calcium
chloride, as described in Sambrook et al., supra, or electroporation is
generally used for
prokaryotes. Infection with Agrobacterium tumefaciens is used for
transformation of certain
plant cells, as described by Shaw et al., Gene, 23:315 (1983) and WO 89/05859
published
29 Jun. 1989. For mammalian cells without such cell walls, the calcium
phosphate
precipitation method of Graham and van der Eb, Virology, 52:456 457 (1978) can
be
employed. General aspects of mammalian cell host system transfections have
been
- 31 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
described in U.S. Pat. No. 4,399,216. Transformations into yeast are typically
carried out
according to the method of Van Solingen et al., J. Bact., 130:946 (1977) and
Hsiao et al.,
Proc. Natl. Acad. Sci. (USA), 76:3829 (1979). However, other methods for
introducing DNA
into cells, such as by nuclear microinjection, electroporation, bacterial
protoplast fusion with
intact cells, or polycations, e.g., polybrene, polyornithine, may also be
used. For various
techniques for transforming mammalian cells, see Keown et al., Methods in
Enzymology,
185:527 537 (1990) and Mansour et al., Nature, 336:348 352 (1988).
[00115] Suitable host cells for cloning or expressing the DNA in the
vectors herein
include prokaryote, yeast, or higher eukaryote cells. Suitable prokaryotes
include, but are
not limited to, eubacteria, such as Gram-negative or Gram-positive organisms,
for example,
Enterobacteriaceae such as E. coli. Various E. coli strains are publicly
available, such as E.
coli K12 strain MM294 (ATCC 31,446); E. coli X1776 (ATCC 31,537); E. coli
strain W3110
(ATCC 27,325) and K5 772 (ATCC 53,635). Other suitable prokaryotic host cells
include
Enterobacteriaceae such as Escherichia, e.g., E. coli, Enterobacter, Erwinia,
Klebsiella,
Proteus, Salinonella, e.g., Salmonella typhimunrium, Serratia, e.g., Serratia
marcescans,
and Shigella, as well as Bacilli such as B. subtilis and B. licheniformis
(e.g., B. licheniformis
41P disclosed in DD 266,710 published 12 Apr. 1989), Pseudomonas such as P.
aeruginosa, and Streptomyces. These examples are illustrative rather than
limiting, and
merely supplement the remainder of the teachings herein. Strain W3110 is one
particularly
preferred host or parent host because it is a common host strain for
recombinant DNA
product fermentations. Preferably, the host cell secretes minimal amounts of
proteolytic
enzymes. For example, strain W3110 may be modified to effect a genetic
mutation in the
genes encoding proteins endogenous to the host, with examples of such hosts
including E.
coli W3110 strain 1 A2, which has the complete genotype tonA; E. coli W3110
strain 9E4,
which has the complete genotype tonA ptr3; E. coli W3110 strain 2707 (ATCC
55,244),
which has the complete genotype tonA ptr3 phoA E15 (argF-lac)169 degP ompT
kanr ; E.
coli W3110 strain 37D6, which has the complete genotype tonA ptr3 phoA E15
(arg F-
lac)169 degP ompT rbs7 ilvC kanr ; E. coli W3110 strain 4064, which is 37D6
with a non-
kanamycin resistant degP deletion mutation; and an E. coli strain having
mutant periplasmic
protease as disclosed in U.S. Pat. No. 4,946,783. Alternatively, in vitro
methods of cloning,
e.g., PCR or other nucleic acid polymerase reactions, are suitable.
[00116] In addition to prokaryotes, eukaryotic microbes such as filamentous
fungi or
yeast are suitable cloning or expression hosts for the mutants. Saccharomyces
cerevisiae is
- 32 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
a commonly used lower eukaryotic host microorganism. Others include
Schizosaccharomyces pombe (Beach and Nurse, Nature, 290: 140 (1981); EP
139,383
published 2 May 1985); Kluyveromyces hosts (U.S. Pat. No. 4,943,529; Fleer et
al.,
Bio/Technology, 9:968 975 (1991)) such as, e.g., K. lactis (MW98-8C, CB5683,
CB54574;
Louvencourt et al., J. Bacteriol., 154(2):737 742 (1983)), K. fragilis (ATCC
12,424), K.
bulgaricus (ATCC 16,045), K. wickeramii (ATCC 24,178), K. waltii (ATCC
56,500), K.
drosophilarum (ATCC 36,906; Van den Berg et al., Bio/Technology, 8:135
(1990)), K.
thermotolerans, and K. marxianus; yarrowia (EP 402,226); Pichia pastoris (EP
183,070;
Sreekrishna et al., J. Basic Microbiol., 28:265 278 [1988]); Candida;
Trichoderma reesia (EP
244,234); Neurospora crassa (Case et al., Proc. Natl. Acad. Sci. USA, 76:5259
5263
(1979)); Schwanniomyces such as Schwanniomyces occidentalis (EP 394,538); and
filamentous fungi such as, e.g., Neurospora, Penicillium, Tolypocladium (WO
91/00357),
and Aspergillus hosts such as A. nidulans (Ba!lance et al., Biochem. Biophys.
Res.
Commun., 112:284289 (1983); Tilburn et al., Gene, 26:205 221 (1983); YeIton et
al., Proc.
Natl. Acad. Sci. USA, 81: 1470 1474 (1984)) and A. niger (Kelly and Hynes,
EMBO J., 4:475
479 (1985)) Methylotropic yeasts are suitable herein and include, but are not
limited to,
yeast capable of growth on methanol selected from the genera consisting of
Hansenula,
Candida, Kloeckera, Pichia, Saccharomyces, Torulopsis, and Rhodotorula. A list
of specific
species that are exemplary of this class of yeasts may be found in C. Anthony,
The
Biochemistry of Methylotrophs, 269 (1982).
[00117] Suitable host cells for the expression of glycosylated mutants can
be derived
from multicellular organisms. Invertebrate cells include insect cells such as
Drosophila S2
and Spodoptera Sf9, as well as plant cells. Useful mammalian host cell lines
include
Chinese hamster ovary (CHO) and COS cells. More specific examples include
monkey
kidney CVI line transformed by 5V40 (COS-7, ATCC CRL 1651); human embryonic
kidney
line (293 or 293 cells subcloned for growth in suspension culture, Graham et
al., J. Gen
Virol., 36:59 (1977)); Chinese hamster ovary cells/-DHFR (CHO, Urlaub and
Chasin, Proc.
Natl. Acad. Sci. USA, 77:4216 (1980)); mouse sertoli cells (TM4, Mather, Biol.
Reprod.,
23:243 251 (1980)); human lung cells (W138, ATCC CCL 75); human liver cells
(Hep G2,
HB 8065); and mouse mammary tumor (MMT 060562, ATCC CCL5 1). One of skill can
readily choose the appropriate host cell, at least for extracellular protein
harvesting
embodiments, without undue experimentation.
- 33 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00118] In some embodiments, a nucleotide sequence will be hybridizable,
under
moderately stringent conditions, to a nucleic acid having a nucleotide
sequence comprising
or complementary to the desired nucleotide sequences. In some embodiments, an
isolated
nucleotide sequence will be hybridizable, under stringent conditions, to a
nucleic acid having
a nucleotide sequence comprising or complementary to the desired nucleotide
sequences.
A nucleic acid molecule can be "hybridizable" to another nucleic acid molecule
when a
single-stranded form of the nucleic acid molecule can anneal to the other
nucleic acid
molecule under the appropriate conditions of temperature and ionic strength
(see Sambrook
et al., supra,). The conditions of temperature and ionic strength determine
the "stringency"
of-the hybridization. "Hybridization" requires that two nucleic acids contain
complementary
sequences. However, depending on the stringency of the hybridization,
mismatches
between bases may occur. The appropriate stringency for hybridizing nucleic
acids
depends on the length of the nucleic acids and the degree of complementation.
Such
variables are well known in the art. More specifically, the greater the degree
of similarity or
homology between two nucleotide sequences, the greater the value of Tm for
hybrids of
nucleic acids having those sequences. For hybrids of greater than 100
nucleotides in
length, equations for calculating Tm have been derived (see Sambrook et al.,
supra). For
hybridization with shorter nucleic acids, the position of mismatches becomes
more
important, and the length of the oligonucleotide determines its specificity
(see Sambrook et
al., supra).
[00119] In some embodiments, the polynucleotides and polypeptides have at
least 55,
60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, or 99 percent
homology to a
desired polynucleotide or polypeptide. In some embodiments, the
polynucleotides and
polypeptides have at least 55, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95,
96, 97, 98, or 99
percent identity to a desired polynucleotide or polypeptide. And, in some
embodiments, the
polynucleotides and polypeptides have at least 55, 60, 65, 70, 75, 80, 85, 90,
91, 92, 93, 94,
95, 96, 97, 98, or 99 percent similarity to a desired polynucleotide or
polypeptide. As
described above, degenerate forms of the desired polynucleotide are also
acceptable. In
some embodiments, a polypeptide can be 90, 91, 92, 93, 94, 95, 96, 97, 98, or
99
homologous, identical, or similar to a desired polypeptide as long as it
shares the same
function as the desired polypeptide, and the extent of the function can be
less or more than
that of the desired polypeptide. In some embodiments, for example, a
polypeptide can have
a function that is 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, or any
0.1%
increment in-between, that of the desired polypeptide. And, in some
embodiments, for
- 34 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
example, a polypeptide can have a function that is 110%, 120%, 130%, 140%,
150%, 160%,
170%, 180%, 190%, 200%, 300%, 400%, 500%, or more, or any 1% increment in-
between,
that of the desired polypeptide. In some embodiments the "function" is an
enzymatic
activity, measurable by any method known to one of skill such as, for example,
a method
used in the teachings herein. The "desired polypeptide" or "desired
polynucleotide" can be
referred to as a "reference polypeptide" or "reference polynucleotide", or the
like, in some
embodiments as a control for comparison of a polypeptide of interest, which
may be
considered a "test polypeptide" or "test polynucleotide" or the like. In any
event, the
comparison is that of one set of bases or amino acids against another set for
purposes of
measuring homology, identity, or similarity. The ability to hybridize is, of
course, another
way of comparing nucleotide sequences.
[00120] The terms "homology" and "homologous" can be used interchangeably
in some
embodiments. The terms can refer to nucleic acid sequence matching and the
degree to
which changes in the nucleotide bases between polynucleotide sequences affects
the gene
expression. These terms also refer to modifications, such as deletion or
insertion of one or
more nucleotides, and the effects of those modifications on the functional
properties of the
resulting polynucleotide relative to the unmodified polynucleotide. Likewise
the terms refer
to polypeptide sequence matching and the degree to which changes in the
polypeptide
sequences, such as those seen when comparing the modified polypeptides to the
unmodified polypeptide, affect the function of the polypeptide. It should
appreciated to one
of skill that the polypeptides, such as the mutants taught herein, can be
produced from two
non-homologous polynucleotide sequences within the limits of degeneracy.
[00121] The terms "similarity" and "identity" are known in the art. The
term "identity" can
be used to refer to a sequence comparison based on identical matches between
correspondingly identical positions in the sequences being compared. The term
"similarity"
can be used to refer to a comparison between amino acid sequences, and takes
into
account not only identical amino acids in corresponding positions, but also
functionally
similar amino acids in corresponding positions. Thus similarity between
polypeptide
sequences indicates functional similarity, in addition to sequence similarity.
Levels of
identity between gene sequences and levels of identity or similarity between
amino acid
sequences can be calculated using known methods. For example, publicly
available
computer based methods for determining identity and similarity include the
BLASTP,
BLASTN and FASTA (Atschul et al., J. Molec. Biol., 1990; 215:403-410), the
BLASTX
- 35 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
program available from NCB!, and the Gap program from Genetics Computer Group,
Madison Wis. In some embodiments, the Gap program, with a Gap penalty of 12
and a Gap
length penalty of 4 can be used for determining the amino acid sequence
comparisons, and
a Gap penalty of 50 and a Gap length penalty of 3 for the polynucleotide
sequence
comparisons. In some embodiments, the sequences can be aligned so that the
highest
order match is obtained. The match can be calculated using published
techniques that
include, for example, Computational Molecular Biology, Lesk, A. M., ed.,
Oxford University
Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith,
D. W., ed.,
Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I,
Griffin, A.
M., and Griffin, H. G., eds., Humana Press, New Jersey, 1994; Sequence
Analysis in
Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis
Primer,
Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991, each of
which is
incorporated by reference herein.
[00122] As such, the term "similarity" is similar to "identity", but in
contrast to identity,
similarity can be used to refer to both identical matches and conservative
substitution
matches. For example, if two polypeptide sequences have 10/20 identical amino
acids, and
the remainder are all non-conservative substitutions, then the percent
identity and similarity
would both be 50%. On the other hand, if there are 5 five more positions where
there are
conservative substitutions, then the percent identity is 50%, whereas the
percent similarity is
75%.
[00123] In some embodiments, the term "substantial sequence identity" can
refer to an
optimal alignment, such as by the programs GAP or BESTFIT using default gap
penalties,
having at least 65, 70, 75, 80, 85, 90, 95, 96, 97, 98, or 99 percent sequence
identity. The
difference in what is "substantial" regarding identity can often vary
according to a
corresponding percent similarity, since the factor of primary importance is
often the function
of the sequence in a system. The term "substantial percent identity" can be
used to refer to
a DNA sequence that is sufficiently similar to a reference sequence at the
nucleotide level to
code for the same protein, or a protein having substantially the same
function, in which the
comparison can allow for allelic differences in the coding region. Likewise,
the term can be
used to refer to a comparison of sequences of two polypeptides optimally
aligned.
[00124] In some embodiments, sequence comparisons can be made to a
reference
sequence over a "comparison window" of amino acids or bases that includes any
number of
amino acids or bases that is useful in the particular comparison. For example,
the reference
- 36 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
sequence may be a subset of a larger sequence. In some embodiments, the
comparison
window can include at least 10 residue or base positions, and sometimes at
least 15-20
amino acids or bases. The reference or test sequence may represent, for
example, a
polypeptide or polynucleotide having one or more deletions, substitutions or
additions.
[00125] The term "variant" refers to modifications to a peptide that allows
the peptide to
retain its binding properties, and such modifications include, but are not
limited to,
conservative substitutions in which one or more amino acids are substituted
for other amino
acids; deletion or addition of amino acids that have minimal influence on the
binding
properties or secondary structure; conjugation of a linker; post-translational
modifications
such as, for example, the addition of functional groups. Examples of such post-
translational
modifications can include, but are not limited to, the addition of modifying
groups described
below through processes such as, for example, glycosylation, acetylation,
phosphorylation,
modifications with fatty acids, formation of disulfide bonds between peptides,
biotinylation,
PEGylation, and combinations thereof. In fact, in most embodiments, the
polypeptides can
be modified with any of the various modifying groups known to one of skill.
[00126] The terms "conservatively modified variant," "conservatively
modified
substitution," and "conservative substitution" can be used interchangeably in
some
embodiments. These terms can be used to refer to a conservative amino acid
substitution,
which is an amino acid substituted by an amino acid of similar charge density,
hydrophilicity/hydrophobicity, size, and/or configuration such as, for
example, substituting
valine for isoleucine. In comparison, a "non-conservatively modified variant"
refers to a non-
conservative amino acid substitution, which is an amino acid substituted by an
amino acid of
differing charge density, hydrophilicity/hydrophobicity, size, and/or
configuration such as, for
example, substituting valine for phenyalanine. One of skill will appreciate
that there are a
plurality of ways to define conservative substitutions, and any of these
methods may be
used with the teachings provided herein. In some embodiments, for example, a
substitution
can be considered conservative if an amino acid falling into one of the
following groups is
substituted by an amino acid falling in the same group: hydrophilic (Ala, Pro,
Gly, Glu, Asp,
Gln, Asn, Ser, Thr), aliphatic (Val, Ile, Leu, Met), basic (Lys, Arg, His),
aromatic (Phe, Tyr,
Trp), and sulphydryl (Cys). See Dayhoff, MO. Et al. National Biomedical
Research
Foundation, Georgetown University, Washington DC:89-99(1972), which is
incorporated
herein. In some embodiments, the substitution of amino acids can be considered
- 37 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
conservative where the side chain of the substitution has similar biochemical
properties to
the side chain of the substituted amino acid.
Microbial systems ¨ antimicrobial lignin-derived compounds
[00127] The antimicrobial activity of lignin-derived compounds is a major
problem
addressed by the systems taught herein. For example, typical industrial
fermentation
processes might utilize the microbes Escherichia coli K12 or Escherichia coli
B, or the yeast
Saccharomyces cerevisiae, and recombinant versions of these microbes, which
are well
characterized industrial strains. The problem is that the antimicrobial
activities of aromatic
compounds on such industrial microbes are toxic to the microbes, which negates
an
application to biotransformations of lignin-derived compounds.
[00128] The phenolic streams or soluble lignin streams derived from
pretreated
lignocellulosic biomass, for example, might contain aromatic and nonaromatic
compounds,
such as gallic acid, hydroxymethylfurfural alcohol, hydroxymethylfurfural,
furfural alcohol,
3,5-dihydroxybenzoate, furoic acid, 3,4-dihydroxybenzaldehyde,
hydroxybenzoate,
homovanillin, syringic acid, vanillin, and syringaldehyde. There are several
lignin-derived
compounds that are antimicrobials. For example, furfural, 4-
hydroxybenzaldehyde,
syringaldehyde, 5-hydroxymethylfurfural, and vanillin are each known to have
antimicrobial
activity against Escherichia coli, and might have an additive antimicrobial
activity against
Escherichia coli when present in combination. Moreover, veratraldehyde,
cinnamic acid and
the respective benzoic acid derivatives of vanillic acid, vanillylacetone, and
the cinnamic
acid derivatives o-coumaric acid, m-coumaric acid, and p-coumaric acid might
be
components of the phenolic streams from pretreated lignocellulosic biomass.
Veratraldehyde, cinnamic acid and the respective benzoic acid derivatives of
vanillic acid,
vanillylacetone, and cinnamic acid derivatives o-coumaric acid, m-coumaric
acid, and p-
coumaric acid, each have significant antifungal activities against the yeast
Saccharomyces
cerevisiae, and might have an additive antifungal activity against the yeast
Saccharomyces
cerevisiae when present in combination.
[00129] One or more of the following benzaldehyde derivatives might be
present in the
phenolic streams from pretreated lignocellulosic biomass: 2,4,6-
trihydroxybenzaldehyde,
2,5-dihydroxybenzaldehyde, 2,3,4-trihydroxybenzaldehyde, 2-hydroxy-5-
methoxybenzaldehyde, 2,3-dihydroxybenzaldehyde, 2-hydroxy-3-
methoxybenzaldehyde, 4-
hydroxy-2,6-dimethoxybenzaldehyde, 2,5-dihydroxybenzaldehyde, 2,4-
- 38 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
dihydroxybenzaldehyde, and 2-hydroxybenzaldehyde. Likewise, 2,4,6-
trihydroxybenzaldehyde, 2,5-dihydroxybenzaldehyde, 2,3,4-
trihydroxybenzaldehyde, 2-
hydroxy-5-methoxybenzaldehyde, 2,3-dihydroxybenzaldehyde, 2-hydroxy-3-
methoxybenzaldehyde, 4-hydroxy-2,6-dimethoxybenzaldehyde, 2,5-
dihydroxybenzaldehyde,
2,4-dihydroxybenzaldehyde, and 2-hydroxybenzaldehyde have each demonstrated
antibacterial activity against Escherichia coli, and might have an additive
antibacterial
activity against Escherichia coli when present in combination.
Microbial systems ¨ suitable microbes
[00130] The antimicrobial activity of lignin-derived compounds creates a
need for a strain
of microbe that is tolerant to such activity in the reaction environment. The
teachings
include the identification of recombinant or non-recombinant microbial species
that are
naturally capable of metabolizing aromatic compounds for the
biotransformations of lignin-
derived compounds to commercial products.
[00131] Some examples of microbial species particularly suited for
biotransformations of
phenolic streams from pretreated lignocellulosic biomass include, but are not
limited to,
Azotobacter chroococcum, Azotobacter vinelandii, Novosphingobium
aromaticivorans,
Pseudomonas aeruginosa, Pseudomonas putida, Pseudomonas fluorescens,
Pseudomonas
stutzerii, Pseudomonas dim inuta, Pseudomonas pseudoalcaligenes,
Rhodopseudomonas
palustris, Spingomonas sp.A1, Sphingomonas paucimobilis SYK-6, Sphingomonas
japonicum, Sphingomonas alaskenesis, Sphingomonas wittichii, Streptomyces
viridosporus, Delftia acidivorans, and Rhodococcus equi. Both bio-informatic
and
experimental data from the literature reveal the presence of extensive
metabolic activity
towards aromatic compounds in these strains, making them relevant species for
the
discovery of enzymes that hydrolyze lignin-derived oligomers, and for
biotransformations of
lignin core structures. Without intending to be bound by any theory or
mechanism of action,
these species exhibit, for example, metabolism of aromatic compounds such as
benzoate;
amino-, fluoro-, and chloro-benzoates; biphenyl; toluene and nitrotoluenes;
xylenes;
alkylbenzenes; styrene; atrazine; caprolactam; and polycyclic aromatic
hydrocarbons.
[00132] The microbes can be grown in a fermentor, for example, using
methods known to
one of skill. The enzymes used in the bioprocessing are obtained from the
microbes, and
they can be intracellular, extracellular, or a combination thereof. As such,
the enzymes can
be recovered from the host cells using methods known to one of skill in the
art that include,
- 39 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
for example, filtering or centrifuging, evaporation, and purification. In some
embodiments,
the method can include breaking open the host cells using ultrasound or a
mechanical
device, remove debris and extract the protein, after which the protein can be
purified using,
for example, electrophoresis. In some embodiments, however, the teachings
include the
use of a microbe, recombinant or non-recombinant, that has tolerance to lignin-
derived
compounds. A microbe that is tolerant to lignin-derived compounds can be used
industrially,
for example, to express any enzyme, recombinant or non-recombinant, having a
desired
enzyme activity while directly in association with the lignin-derived
compounds. Such
activities include, for example, beta etherase activity, C-alpha-dehydrogenase
activity,
glutathione lyase activity, or any other enzyme activity that would be useful
in the
biotransformation of lignin-derived compounds. The activities can be wild-type
or produce
through methods known to one of skill, such as transfection or transformation,
for example.
Microbial systems ¨ Azotobacter strains
[00133] The teachings herein are also directed to the discovery and use of
recombinant
Azotobacter strains heterologously expressing novel beta-etherase enzymes for
the
hydrolysis of lignin oligomers.
[00134] Research directed to the discovery of a suitable microbe has shown
that
Azotobacter vinelandii may possess the industrially relevant strain criteria
desired for the
teachings provided herein. In some embodiments, the criteria includes (i)
growth on
inexpensive and defined medium, (ii) resistance to inhibitors in hydrolysates
of
lignocellulose, (iii) tolerance to acidic pH and higher temperatures, (iv) the
co-fermentation of
pentose and hexose sugars, (v) genetic tractability and availability of gene
expression tools,
(vi) rapid generation times, and (vii) successful growth performance in pilot
scale
fermentations. Additionally, key physiological traits that contribute to the
potential suitability
of A. vinelandii to the conversion of lignin-streams include an ability to
metabolize aromatic
compounds and xenobiotics. Moreover, it has been shown to have a tolerance to
phenolic
compounds in industrial waste streams. The annotated genome sequence of A.
vinelandii,
and the availability of genetic tools for its transformation and for the
heterologous expression
of enzymes, contribute to the potential of this microbe to function, in it's
native form or as a
transformant, for example, in a high-yield production of industrial chemicals
from lignin
streams.
- 40 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00135] The teachings are also directed to a method of cleaving a beta-aryl
ether bond,
the comprising contacting a polypeptide taught herein with a lignin-derived
compound
having (i) a beta-aryl ether bond and (ii) a molecular weight ranging from
about 180 Da!tons
to about 3000 Da!tons; wherein, the contacting occurs in a solvent environment
in which the
lignin-derived compound is soluble. The term "contacting" refers to placing an
agent, such
as a compound taught herein, with a target compound, and this placing can
occur in situ or
in vitro, for example.
[00136] The teachings are also directed to a method of cleaving a beta-aryl
ether bond,
the comprising contacting a polypeptide taught herein with a lignin-derived
compound
having (i) a beta-aryl ether bond and (ii) a molecular weight ranging from
about 180 Da!tons
to about 3000 Da!tons; wherein, the contacting occurs in a solvent environment
in which the
lignin-derived compound is soluble. In some embodiments, the lignin-derived
compound
has a molecular weight of about 180 Da!tons to about 1000 Da!tons. In some
embodiments,
the solvent environment comprises water. And, in some embodiments, the solvent
environment comprises a polar organic solvent.
[00137] The teachings are also directed to a system for bioprocessing
lignin-derived
compounds, the system comprising a polypeptide taught herein, a lignin-derived
compound
having a beta-aryl ether bond and a molecular weight ranging from about 180
Da!tons to
about 3000 Da!tons; and, a solvent in which the lignin-derived compound is
soluble;
wherein, the system functions to cleave the beta-aryl ether bond by contacting
the
polypeptide with the lignin-derived compound in the solvent.
[00138] The teachings are also directed to a recombinant polynucleotide
comprising a
nucleotide sequence that encodes a polypeptide taught herein. Likewise, the
teachings are
also directed to a vector or plasmid comprising the polynucleotide, as well as
a host cell
transformed by the vector or plasmid to express the polypeptide.
[00139] The teachings are also directed to a method of cleaving a beta-aryl
ether bond,
the method comprising (i) culturing a host cell taught herein under conditions
suitable to
produce a polypeptide taught herein; (ii) recovering the polypeptide from the
host cell
culture; and, (iii) contacting the polypeptide of claim 1 with a lignin-
derived compound having
a beta-aryl ether bond and a molecular weight ranging from about 180 Da!tons
to about
3000 Da!tons; wherein, the contacting occurs in a solvent environment in which
the lignin-
derived compound is soluble.
- 41 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00140] In some embodiments, the host cell can be E. Coli or an
Azotobacter strain, such
as Azotobacter vinelandii. And, in some embodiments, the lignin-derived
compound can
have a molecular weight of about 180 Da!tons to about 1000 Da!tons.
[00141] The teachings are also directed to a system for bioprocessing
lignin-derived
compounds, the system comprising (i) a transformed host cell taught herein;
(ii) a lignin-
derived compound having a beta-aryl ether bond and a molecular weight ranging
from about
180 Da!tons to about 3000 Da!tons; and, (iii) a solvent in which the lignin-
derived compound
is soluble; wherein, the system functions to cleave the beta-aryl ether bond
by contacting a
polypeptide taught herein with the lignin-derived compound in the solvent.
EXAMPLES
[00142] The following examples illustrate, but do not limit, the present
invention.
EXAMPLE 1
[00143] Microbial growth and metabolism studies on soluble lignin samples
are
performed to test the tolerance of microbes on lignin-derived compounds. A set
of aromatic
and nonaromatic compounds known to inhibit growth of E. coil and S. cerevisiae
strains
might be used to characterize the growth, tolerance and metabolic capability
of Azotobacter
vinelandii strain BAA1303, and A. chroococcum strain 4412 (EB Fred) X-50.
Metabolism of
various aromatic and nonaromatic compounds by microbial strains might be
determined as a
function of cellular respiration by the reduction of soluble tetrazolium salts
by actively
metabolizing cells. XTT (2,3-Bis(2-methoxy-4-nitro-5-sulfophenyI)-2H-
tetrazolium-5-
carboxanilide inner salt, Sigma) is reduced to a soluble purple formazan
compound by
respiring cells. E. coil might be used as the negative control strain in this
study. Strains
might be grown in rich medium to saturation, washed, and OD600nm of the
cultures
determined. Equal numbers of bacteria will be inoculated into wells of the 48-
well growth
asing concentrations of aromatic and non-aromatic compounds in the range of 0-
500mM,
will be added to the wells to a final volume of 0.8m1. Following incubation
for 24-48 hours
with shaking at 25-37 C, the cultures will be tested for growth upon exposure
to the test
compounds using the XTT assay kit (Sigma). Culture samples will removed from
the 48 well
growth plate, and diluted appropriately in 96 well assay plates to which the
XTT reagent will
be added. Soluble formazan formed will be quantified by absorbance at 450nm.
Increased
absorbance at 450nm will be indicative of growth or survival, or metabolism of
a particular
- 42 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
test compound by the strains. Table 3 lists some example compounds that can be
used to
test the tolerance of microbes on lignin-derived compounds.
[00144] Table 3.
Test Compound
1 Syringic acid
2 Syringaldehyde
3 Gallic acid
4 Furfural
5-Hydroxymethylfurfural
6 4-hydroxybenzaldehyde
7 Hydroxybenzoate
8 Vanillin
9 Vanillic acid
Cinnamic acid
11 o-, m-and p-Coumaric acids
12 2-hydroxy-3-methoxybenzaldehyde
13 2,4,6-trihydroxybenzaldehyde
14 4-hydroxy-2,6-dimethoxybenzaldehyde
[00145] The set of lignin compounds to be tested might be expanded to any
of the
teachings provided herein. And, the microbial growth and metabolism studies on
soluble
lignin samples can also be performed actual industrial samples such as, for
example, kraft
lignins and biorefinery lignins.
EXAMPLE 2
[00146] This example illustrates how prospective enzymes were identified
for use with
the teachings provided herein. Although never successfully expressed
heterologously as an
industrial microbe in a commercial scale process, Sphingomonas paucimobilis
has been
shown to produce enzymes that have some activity in cleaving the beta aryl
ether bond in
lignin. See Masai, E., et al. Accordingly, the enzyme discovery effort started
with running
BLAST searches against the two enzymes identified by Masai as having beta
etherase
activity, "ligE" and "ligF". See Id. at Abstract. Table 4 lists genes
identified in the BLAST
searches for initial screening.
- 43 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00147] Table 4.
Gene Species Activity Genbank
Identity/Similarity
Accession # (0/0)
1 ligE Sphingomonas Beta-etherase BAA02032.1
paucimobilis
2 ligE-1 Novosphingobium Putative ABD26841.1 (62%) (75%)
aromaticivorans Beta-etherase
3 ligF Sphingomonas Beta-etherase BAA02031.1
paucimobilis
4 ligF-1 Novosphingobium Putative ABD26530.1 (60%) (77%)
aromaticivorans Beta-etherase
aromaticivorans Beta-etherase
aromaticivorans Beta-etherase
[00148] The nucleotide and amino acid sequences in Table 4 are incorporated
herein by
reference in their entirety through the GenBank Accession Numbers.
EXAMPLE 3
[00149] This example describes a method for preparing recombinant host
cells for the
heterologous expression of known and putative beta-etherase encoding gene
sequences in
Escherichia coil (E. coli). E. coil is used in this example as a surrogate
enzyme production
host organism for the enzyme discovery. The construction of a novel industrial
host
microbe, A. vinelandii is described below.
[00150] The gene sequences with accession numbers in Table 3 were
synthesized
directly as open reading frames (ORFs) from oligonucleotides by using standard
PCR-based
assembly methods, and using the E. coil codon bias with 10% threshold. The end
sequences contained adaptors (Ndel and Xhol) for restriction digestion and
cloning into the
E. coil expression vector pET24b (Novagen). Internal Ndel and Xhol sites were
excluded
from the ORF sequences during design of the oligonucleotides. Assembled genes
were
cloned into a cloning vector (pG0V4), transformed into E. coil CH3 chemically
competent
cells, and DNA sequences determined from purified plasmid DNA. After sequence
verification, restriction digestion was used to excise each ORF fragment from
the cloning
vector, and the sequence sub-cloned into pET24b. The entire set of ligE and
ligF bearing
plasmids were then transformed into E. coil BL21 (DE3) which served as the
host strain for
beta-etherase expression and biochemical activity testing.
- 44 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00151] LigE, from Accession No BAA2032.1, is listed herein as SEQ ID NO:1
for the
protein and SEQ ID NO:2 for the gene. An "optimized" nucleic acid sequence was
created
to facilitate the transformation in E. co/land is listed herein as SEQ ID
NO:977.
[00152] LigE-1, from Accession No ABD26841.1, is listed herein as SEQ ID
NO:101 for
the protein and SEQ ID NO:102 for the gene. An "optimized" nucleic acid
sequence was
created to facilitate the transformation in E. co/land is listed herein as SEQ
ID NO:978.
[00153] LigF, from Accession No BAA2031.1 (P30347.1), is listed herein as
SEQ ID
NO:513 for the protein and SEQ ID NO:514 for the gene. An "optimized" nucleic
acid
sequence was created to facilitate the transformation in E. coli and is listed
herein as SEQ
ID NO:979.
[00154] LigF-1, from Accession No ABD26530.1, is listed herein as SEQ ID
NO:539 for
the protein and SEQ ID NO:540 for the gene. An "optimized" nucleic acid
sequence was
created to facilitate the transformation in E. co/land is listed herein as SEQ
ID NO:980.
[00155] LigF-2, from Accession No ABD27301.1, is listed herein as SEQ ID
NO:541 for
the protein and SEQ ID NO:542 for the gene. An "optimized" nucleic acid
sequence was
created to facilitate the transformation in E. coli and is listed herein as
SEQ ID NO:981.
[00156] LigF-3, from Accession No ABD27309.1, is listed herein as SEQ ID
NO:545 for
the protein and SEQ ID NO:546 for the gene. An "optimized" nucleic acid
sequence was
created to facilitate the transformation in E. co/land is listed herein as SEQ
ID NO:982.
EXAMPLE 3
[00157] This example describes a method for gene expression in E. coli, as
well as beta-
etherase biochemical assays. Expression of known and putative beta-etherase
genes was
performed using 5m1 cultures of the recombinant E. coli strains described
herein in Luria
Broth medium by induction of gene expression using isopropylthiogalactoside
(IPTG) to a
final concentration of 0.1mM. Following induction, and cell harvest, the cells
were disrupted
using either sonication or the BPER (Invitrogen) cell lysis system.
[00158] Clarified cell extracts were tested in the in vitro biochemical
assay for beta-
etherase activity on a fluorescent substrate, a model lignin dimer compound a-
0-(B-
methylumbelliferyl) acetovanillone (MUAV). In vitro reactions were performed
in a total volume
- 45 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
of 200u1 and contained: 25mM TrisHCI pH 7.5; 0.5mM dithiothreitol; 1mM
glutathione;0.05mM or 0.1mM MUAV; lOul of clarified cell extract used to
initiate the
reactions. Following incubation for 2.5 hours at room temperature, a 50u1
sample of the
reactions was terminated using 150uL of 300mM glycine/NaOH buffer pH 9. The
formation
of 4 methylumbelliferone (4MU) upon hydrolysis of the aryl ether bond was
monitored by the
increase in fluorescence at X,ex=360nm and X,em=450nm using a Spectramax
UV/visible/fluorescent spectrophotometer.
[00159] The total protein concentrations of the cell lysates were
determined using the
BOA reagent system for protein quantification (Pierce).
[00160] Induction might be also performed using IPTG concentrations in the
range of
0.01-1mM. Cell disruption might be also performed using toluene
permeabilization, French
pressure techniques, or using multiple freeze/thaw cycles in conjunction with
lysozyme.
Assay conditions might be varied to include TrisHCI at 10-150mM concentrations
and in the
pH range of 6.5-8.5; 0-2mM dithiothreitol; 0.05-2mM glutathione; 0.01-5mM MUAV
substrate; 22-42 C reaction temperatures. The biochemical assay might be
performed as a
fixed time point assay with reaction times ranging from 5 minutes-12 hours, or
performed
continuously without quenching with glycine/NaOH buffer to extract enzyme
kinetic
parameters.
EXAMPLE 4
[00161] This example describes the tested biochemical activities of the
newly-discovered
beta-etherase enzymes.
[00162] FIG. 4 illustrates unexpected results from biochemical activity
assays for beta-
etherase function for the S. paucimobilis positive control polypeptides, and
the N.
aromaticivorans putative beta-etherase polypeptide, according to some
embodiments. The
much elevated beta-etherase activity exhibited by the putative ligEl gene
product from N.
aromaticivorans as compared to the S. paucimobilis ligE gene product was a
completely
unexpected result of the enzyme discovery program.
[00163] In reactions containing 0.1mM MUAV substrate, E. coil cell extracts
expressing
the N. aromaticovorans ligE1 protein yielded a total activity of 529rfu/ug
compared to 7rfu/ug
for the S. paucimobilis ligE protein. The newly discovered beta-etherase from
N.
- 46 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
aromaticovorans is approximately 75-fold more efficient than the previously
described S.
paucimobilis ligE beta-etherase enzyme. The highly efficient novel beta-
etherase is ideally
suited to be a biocatalyst for conversion of lignin aryl ethers to monomers in
biotechnological
processes.
[00164] It was also surprising to find that 3 novel N. aromaticivorans
polypeptides having
identities to the S. paucimobilis LigF sequence showed beta-etherase activity
on the MUAV
substrate. While all 3 putative ligF gene products from N. aromaticivorans
exhibited beta-
etherase activity, the LigF2 polypeptide is approximately 2-fold more
efficient than the S.
paucimobilis LigF protein. The N. aromaticovorans LigF2 protein yielded a
total activity of
1206rfu/ug compared to 558rfu/ug for the S. paucimobilis LigF protein.
[00165] As such, the enzyme discovery program unexpectedly and
surprisingly
generated four (4) novel polypeptides from N. aromaticivorans with beta-
etherase activity.
This set of enzymes show great potential for the catalysis of a complete
depolymerization of
lignin-derived compounds. The results were unexpected and surprising for at
least the
following reasons:
[00166] Four (4) novel gene sequences encoding polypeptides with beta-
etherase activity
were discovered from N. aromaticivorans. These sequences have GenBank Nos.
ABD26841.1 (SEQ ID NO:101); ABD26530.1 (SEQ ID NO:539); ABD27301.1 (SEQ ID
NO:541); and ABD27309.1 (SEQ ID NO:545).
[00167] One of skill will appreciate that the bioinformatic screen that
was used to help
identify putative enzymes is not a definitive predictor in itself of
biochemical activities,
particularly in view of (i) having only one known active enzyme for LigE in a
different
species, (ii) one known active enzyme for LigF, and (iii) the unexpected
extent of such
activities discovered. The tests for function therefore had to be performed
empirically on the
N. aromaticivorans putative beta-etherase gene set.
[00168] One of skill will also appreciate that the discovery of beta-
etherase activities for
all 4 N. aromaticivorans polypeptides was a complete surprise given the
relatively low levels
of identities (37%-62%) the sequences had with respect to the S. paucimobilis
LigE and
LigF proteins.
[00169] One of skill will also appreciate that the discovery of 2 novel
beta-etherases from
the N. aromaticivorans with improved activities over the corresponding LigE
and LigF
- 47 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
proteins from S. paucimobilis were completely unexpected, and this exciting
discovery
provides a foundation for further enzyme development for industrial
applications.
EXAMPLE 5
[00170] This example describes the extended use of bioinformatics to
identify a pool of
putative enzymes in the discovery program. As noted above, the bioinformatic
screen that
was used to help identify putative enzymes initially was not a definitive
predictor in itself of
biochemical activities, particularly in view of (i) having only one known
active enzyme for
LigE in a different species, (ii) one known active enzyme for LigF, and (iii)
the unexpected
extent of such activities discovered. Having the additional known active
enzymes provided
more information that could be used to enhance the effectiveness of the
bioinformatics in
identifying the pool of putative enzymes for both LigE-type and LigF-type
enzymes.
[00171] Sequence to function correlations for the newly discovered beta-
etherases were
analyzed and identified. A bioinformatic survey of functional domains,
essential catalytic
residues, and sequence alignments was performed for the N. aromaticivorans
LigE and LigF
polypeptides. While not intending to be bound by any theory or mechanism of
action, the
rationale and key results of the survey include at least the following:
[00172] Identifying functional domains
[00173] As shown in FIG. 4, high levels of beta-etherase activities were
discovered for
the N. aromaticivorans LigE1 and LigF2 polypeptide sequences compared to the
S.
paucimobilis LigE and LigF proteins. The N. aromaticivorans LigE1 and LigF2
polypeptide
sequences were used as query sequences for the identification of functional
domains using
the Conserved Domain Database (CDD) in GenBank.
[00174] The N. aromaticivorans LigE1 polypeptide is annotated as a
glutathione S-
transferase (GST)-like protein with similarity to the GST C family, and the
beta-etherase
LigE subfamily. The LigE sub-family is composed of proteins similar to S.
paucimobilis beta
etherase, LigE, a GST-like protein that catalyzes the cleavage of the beta-
aryl ether linkages
present in low-moleculer weight lignins using reduced glutathione (GSH) as the
hydrogen
donor in the reaction. The GST fold contains an N-terminal thioredoxin-fold
domain and a
C-terminal alpha helical domain, with an active site located in a cleft
between the two
domains.
- 48 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00175] Table 5 describes conserved domains and essential amino acid
residues in the
N. aromaticivorans LigE1 polypeptide (ABD26841.1), according to some
embodiments. The
three (3) conserved functional domains annotated in the N. aromaticivorans
LigE1
polypeptide are: i) the dimer interface; ii) the N terminal domain; iii) the
lignin substrate
binding pocket or the H site. Amino acid residues defining the functional
domains in such
embodiments are residues 98-221 in the N. aromaticivorans LigE1 polypeptide.
[00176] Table 5 also lists fifteen (15) amino acid residues as conserved
and essential for
catalytic activity (column 3 of Table 5)õ according to some embodiments. These
include:
K100; A101; N104; P166; W107; Y184; Y187; R188; G191; G192; F195; V111; G112;
M115; F116. While not intending to be bound by any theory or mechanism of
action, these
residues appear responsible for the high beta-etherase catalytic activity
discovered for the
N. aromaticivorans LigE1 polypeptide compared to the S. paucimobilis ligE
polypeptide.
[00177] In such embodiments, the essential amino acid residues of the N.
aromaticivorans LigE1 polypeptide might be altered conservatively, and singly
or in
combination with similar amino acid residues that would retain or improve the
catalytic
function of the N. aromaticivorans LigE1 polypeptide. Examples of such
alternate residues
that might be incorporated at the essential positions are also shown in column
4 of Table 5.
- 49 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00178] Table 5.
Functional Residues Conserved Alternate residues
domain defining the residues essential
suggested
domain in for catalysis in N. for the essential
positions
N. aromaticivorans
aromaticivorans LigEl
LigEl
Dimer interface (residues 98-221 of K100; A101; N104;
K100->R
SEQ ID NO:101) P166 A101->L; 1; V; G; S
N104->Q; H; S; A
N terminal (residues 98-221 of K100;
W107; Y184; K100->R
domain interface SEQ ID NO:101) Y187; R188;G191;
W107->Y; F; A; S
F195 Y184->W; F; A; S
Y187-> W; F; A; S
R188->K
G191-> L; 1; V; A; S
Fl 95->W; Y; A; S
Lignin/substrate (residues 98-221 of W107; V111; G112;
W107->Y; F; A; S
binding pocket or SEQ ID NO:101) M115; F116; G192; V111->
L; 1; G; A; S
H site F195 G112-> L; I; V; A; S
M115->S; A; G
G192-> L; 1; V; A; S
F195-> W; Y; A; S
[00179] The N. aromaticivorans LigF2 polypeptide is annotated as a
glutathione S-
transferase (GST)-like protein with similarity to the GST C family, catalyzing
the conjugation
of glutathione with a wide range of xenobiotic agents.
[00180] Table 6 describes conserved domains and essential amino acid
residues in the
N. aromaticivorans LigF2 polypeptide (ABD27301.1), according to some
embodiments. The
three (3) conserved functional domains annotated for the N. aromaticivorans
LigF2
polypeptide are similar to those described for the N. aromaticivorans LigE
polypeptide and
comprise: i) the dimer interface; ii) the N terminal domain; iii) the
substrate binding pocket or
the H site. In such embodiments, amino acid residues defining the functional
domains are
residues 99-230 in the N. aromaticivorans LigF2 polypeptide.
[00181] Table 6 also lists sixteen (16) amino acid residues as conserved
and essential for
catalytic activity (column 3 of Table 6) of the N. aromaticivorans LigF2
polypeptide,
according to some embodiments. These include: R100; Y101; K104; K176; D107;
L194;
1197; N198; S201; M206; M111; N112; S115; M116; M206; H202. While not
intending to be
bound by any theory or mechanism of action, these 16 residues appear to be
responsible for
the high beta-etherase catalytic activity discovered for the N.
aromaticivorans LigF2
polypeptide compared to the S. paucimobilis LigF polypeptide.
- 50 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00182] In such embodiments, the essential amino acid residues of the N.
aromaticivorans LigF2 polypeptide might be altered conservatively, and singly
or in
combination with similar amino acid residues that would retain or improve the
catalytic
function of the N. aromaticivorans LigF2 polypeptide. Examples of such
alternate residues
that might be incorporated at the essential positions are shown in column 4 of
Table 6.
[00183] Table 6.
Functional Residues Conserved
Alternate residues suggested
domain defining the domain residues
essential for the essential positions
in for catalysis in N.
N. aromaticivorans aromaticivorans
LigF2 LigF2
Dimer interface (residues 99-230 of R100; Y101; K104; R100->K
SEQ ID NO:541) K176 Y101-> W; F; A; S
K104->R
K176->R
N terminal (residues 99-230 of R100; D107; L194; R100->K
domain SEQ ID NO:541) 1197; N198; S201; D107->E
interface M206 L194-> V; 1; G; A; S
1197-> L; V; G; A; S
N198->Q
5201->A; M; G
M206->S; A; G
Substrate (residues 99-230 of D107; M111; N112; D107->E
binding pocket SEQ ID NO:541) S115; M116; M206; M111->S;
A; G
or H site H202 N112->Q
S115->A; M; G
M116->S; A; G
M206->S; A; G
H202->N; Q; S; M
[00184] Identifying additional functional domains
[00185] Bioinformatic methods were used to further understand the protein
structure that
may result in the desired activities. First, the LigE1 and LigF2 were analyzed
together.
Amino acid sequence alignments were performed using the N. aromaticivorans
ligE1
(ABD26841.1) and ligF2 (ABD27301.1) sequences using the BLAST-P program in
GenBank, and the ProDom and PraLine programs. Full length sequence alignments
yielded
hits with relatively low identities, for example, identities of <70%.
[00186] Next, regions in LigE1 and LigF2 were analyzed independently in
GENBANK.
For LigE1, an alignment was performed against the database in GENBANK using
the
following query sequence: "tispfvwatkyalkhkgfdldvvpggftgilertgg" (residues 19-
54 of SEQ ID
NO:101), from N. aromaticivorans ligE1. The BLAST yielded at least 3 subject
sequences
- 51 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
with high identities in the thioredoxin (TRX)-like superfamily of proteins
containing a TRX
fold. Many members contain a classic TRX domain with a redox active CXXC
motif.
[00187] Without intending to be bound by any theory or mechanism of
action, they are
thought to function as protein disulfide oxidoreductases (PD05), altering the
redox state of
target proteins via the reversible oxidation of their active site dithiol. The
PDO members of
this superfamily include the families of TRX, protein disulfide isomerase
(PDI), tIpA,
glutaredoxin, NrdH redoxin, and bacterial Dsb proteins (DsbA, DsbC, DsbG,
DsbE,
DsbDgamma). Members of the superfamily that do not function as PDOs but
contain a
TRX-fold domain include phosducins, peroxiredoxins, glutathione (GSH)
peroxidases, SCO
proteins, GSH transferases (GST, N-terminal domain), arsenic reductases, TRX-
like
ferredoxins and calsequestrin, among others.
[00188] Table 7 lists 3 subject sequences having high identities (>80%)
to residues 19-54
of LigE-1 (SEQ ID NO:101). In some embodiments, these sequences are likely to
be
essential to catalytic functions similar to those discovered for the N.
aromaticivorans IigE1
polypeptide.
[00189] Table 7.
Subject sequence Species; Gene GenBank
Identity/Similarity to
accession #
N. aromaticovorans
LigE1 query
sequence residues
19-54 (%)
(residues 19-54 of SEQ ID Sphingomonas BAA02032.1 89/97
NO:1) paucimobilis; beta
TISPYVWRTKYALKHKGFDI etherase
DIVPGGFTGILERTGG
(residues 19-54 of SEQ ID Novosphingobium sp. YP004533906.1 86/92
NO:89) PP1Y; glutathione S
TISPFVWRTKYALAHKGFD transferase like protein
VDIVPGGFTGIAERTGG
(residues 19-54 of SEQ ID Sphingobium sp. SYK- BAJ11989.1 83/94
NO:3) 6;
TISPFVWATKYAIAHKGFEL beta-etherase
DIVPGGFSGIPERTGG
[00190] The nucleotide and amino acid sequences in Table 7 are
incorporated herein by
reference in their entirety through the GenBank Accession Numbers.
[00191] Likewise, for LigF2, separate alignments were performed against
the database in
GENBANK using the following 2 query sequences: "ainpegqvpvl" (residues 47-57
of SEQ
- 52 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
ID NO:541); and "iithttvineyled" (residues 63-76 of SEQ ID NO:541), from N.
aromaticivorans ligF2 (ABD27301.1) yielded multiple subject sequences with
high identities
in the GST-N superfamily of proteins. Without intending to be bound by any
theory or
mechanism of action, the N terminal region (residues 43-75 of SEQ ID NO:541)
of the N.
aromaticivorans ligF2 polypeptide is annotated in the ODD to encompass:
[00192] i. N terminal residues thought to make contact with the C
terminal interface
in forming the tertiary protein structure for the GST-N family of proteins;
[00193] ii. N terminal residues thought to be involved in dimerization
of the
polypeptides; and,
[00194] iii. Residues thought to be involved in the binding of
glutathione substrate.
[00195] Table 8 provides the percent identities and similarities to N.
aromaticovorans
LigF2 query sequence residues 47-57.
- 53 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00196] Table 8.
Subject sequence Species; Gene GenBank
Identity/Similarity to N.
accession # aromaticovorans LigF2
query sequence
residues 47-57 (%)
(residues 45-55 of Proteus mirabilis ATCC ZP 03840063.1 91/91
SEQ ID NO:983) 29906; glutathione S-
AINPKGQVPVL transferase
(residues 60-70 of Neisseria macacae ATCC ZP 08683997.1 82/91
SEQ ID NO:985) 33926; glutathione S-
AINPQGQVPAL transferase
(residues 43-53 of Rhodospirillum rubrum; YP 425114.1 82/91
SEQ ID NO:987) glutathione S-transferase-
AMNPEGEVPVL like protein
(residues 46-56 of Neisseria sicca ATCC ZP 05317369.1 82/91
SEQ ID NO:989) 29256; glutathione S-
AINPQGQVPAL transferase
(residues 46-56 of Neisseria mucosa ATCC ZP 05978410.1 82/91
SEQ ID NO:991) 25996; glutathione S-
AINPQGQVPAL transferase
(residues 19-29 of alpha proteobacterium ZP 02189431.1 82/91
SEQ ID NO:993) BALI 99; Glutathione S-
AINPAGEVPVL transferase-like protein
(residues 31-41 of Marinomonas sp. MED121; ZP 01077889.1 91/91
SEQ ID NO:995) glutathione 5-transferase
AINPLGQVPVL
(residues 46-55 of Proteus penneri ATCC ZP 03805830.1 90/90
SEQ ID NO:997) 35198; hypothetical protein
INPKGQVPVL PROPEN 04226
(residues 45-55 of AURANDRAFT 7474 EGB13094.1 82/91
SEQ ID NO:999) Aureococcus
AINPQGKVPVL anophagefferens;
hypothetical protein
[00197] The nucleotide and amino acid sequences in Table 8 are
incorporated herein by
reference in their entirety through the GenBank Accession Numbers.
[00198] Table 9 provides the percent identities and similarities to N.
aromaticovorans
LigF2 query sequence residues 63-76.
- 54 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00199] Table 9.
Subject sequence Species; Gene GenBank
Identity/Similarity to N.
accession # aromaticovorans
LigF2 query sequence
residues 63-76 (%)
(residues 107-115 of Trichophyton
verrucosum XP 003019921.1 100/100
SEQ ID NO:1001) HKI 0517; conserved
TVINEYLED hypothetical protein
(residues 103-111 of Arthroderma benhamiae
XP 003017304.1 100/100
SEQ ID NO:1003) CBS 112371; conserved
TVINEYLED hypothetical protein
(residues 72-80 of Trichophyton rubrum CBS XP 003232549.1 100/100
SEQ ID NO:1005) 118892; glutathione
TVINEYLED transferase
(residues 62-75 of Novosphingobium sp. PP1Y YP 004533905.1 79/79
SEQ ID NO:1007) ; glutathione S-transferase-
IITESTVICEYLED like protein
(residues 84-92 of Arthroderma gypseum CBS XP 003171868.1 89/100
SEQ ID NO:1009) 118893; hypothetical protein
TVINEFLED MGYG 06412
(residues 61-69 of Trichophyton equinum CBS EGE04518.1 89/100
SEQ ID NO:1011) 127.97; hypothetical protein
TVINEFLED TEQG 03389
[00200] The nucleotide and amino acid sequences in Table 9 are
incorporated herein by
reference in their entirety through the GenBank Accession Numbers.
[00201] The bioinformatics provides valuable information about protein
structure that can
assist in identifying test candidates. For example, the LigE1 has the 98-221
region, which is
annotated in the databases as potentially responsible as component of binding
and activity,
dimerization, and for binding and catalysis in general. While not intending to
be bound by
any theory or mechanism of action, the variability in active site structures
is reflected by the
variability in substrate structures. Likewise, upon further research using
bioinformatics, it
was further discovered that the 19-54 region, which is annotated in the
databases as a
second region that is potentially responsible as component of the reductase
function, and
thus potentially responsible for catalysis in addition to the 98-221 region,
while having more
conservation between members.
[00202] Obtaining additional structural information that will assist in
finding high
performing proteins within each family of strains is within the scope of the
teachings to the
extent that the methodology is known to one of skill. A variety of research
techniques are
known to one of skill. Bioinformatic methods, such as motif finding, are an
example of one
- 55 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
way to obtain the additional structural information. Motif finding, also known
as profile
analysis, constructs global multiple sequence alignments that attempt to align
short
conserved sequence motifs among the sequences in the query set. This can be
done, for
example, by first constructing a general global multiple sequence alignment,
after which
highly conserved regions are isolated, in a manner similar to what is taught
herein, and used
to construct a set of profile matrices. The profile matrix for each conserved
region is
arranged like a scoring matrix but its frequency counts for each amino acid or
nucleotide at
each position are derived from the conserved region's character distribution
rather than from
a more general empirical distribution. The profile matrices are then used to
search other
sequences for occurrences of the motif they characterize.
[00203] Lig E-1 and Lig F-2 were further examined by comparing their
structures to other
polypeptides of the LigE-type and LigF-type, respectively. Table 10A shows
conserved
residues between the polypeptide sequences of LigE and LigE-1, and Table 10B
shows
shows conserved residues between the polypeptide sequences of LigF and LigF-2.
- 56 -

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
[00204] Table 10A.
Res Pos Res Pos Res Pos Res Pos Res Pos Res Pos
M 1 P 42 Y 78 Q 129 P 184 P 235
A 2 G 43 L 79 D 130 N 185 L 236
N 4 G 44 D 80 Y 133 A 187 F 237
N 5 F 45 K 82 V 134 D 188 G 238
T 6 T 46 Y 83 S 137 Y 189 L 239
I 7 G 47 P 84 R 138 T 198 R 242
T 8 I 48 D 85 E 139 A 199 E 243
Y 10 L 49 R 86 L 148 S 200 G 244
D 11 E 50 P 87 E 149 V 201 D 245
L 12 R 51 L 89 V 151 T 204 P 246
L 14 T 52 K 100 Q 152 P 205 F 249
G 17 G 53 L 102 A 153 L 207 R 251
T 19 G 54 D 103 G 154 D 210 G 254
I 20 E 57 N 104 R 155 D 211 G 257
S 21
P 22
/ 24 R 58 W 105 E 156 P 212 N 264
W 25 P 60 W 107 L 213 G 266
T 27 I 62 A 110 R 158 R 214 P 267
K 28 V 63 V 111 L 159 D 215
Y 29 D 64 G 112 P 160 W 216 T 270
A 30 D 65 P 113 L 166 R 219 R 275
L 31 G 66 W 114 E 167 D 222 E 278
K 32 E 67 C 117 P 168 L 223
H 33 V 69 D 121 R 170 G 226
K 34 L 70 Y 122 L 173 L 227
G 35 D 71 D 124 A 174 G 228
F 36 S 72 L 125 W 178 R 229
D 37 W 73 S 126 L 179 H 230
D 39 I 75 L 127 G 180 P 231
/ 41 E 77 P 128 G 181 G 232
[00205] As can be seen, there is a high degree of between-species
similarity between
LigE and LigE-1 in the LigE-type family. The LigE residues are from S.
paucimobilis
(BAA02032.1) and the LigE-1 residues are from N. aromaticivorans LigE1
(ABD26841.1).
The numbering is done according to the S. paucimobilis sequence (BAA02032.1)
in the
PRALINE alignment file (gaps not included).
- 57 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00206] Table 10B.
Res Pos Res Pos
M 1 D 89
Y 6 R 97
P 10 W 99
A 12 K 101
N 13 L 161
S 14 K 167
K 16
L 21
E 23 E 176
K 24 L 179
G 25 L 185
L 26 Y 190
E 29 L 192
D 34 A 193
F 38 D 194
E 39 I 195
H 41 P 221
F 45 L 223
I 48 W 226
N 49 R 229
P 50 R 233
G 52 P 234
/ 54 A 235
P 55
T 65
T 68
I 70
E 72
Y 73
L 74
E 75
D 76
L 85
P 87
[00207] As can be seen, there is less between-species similarity between
LigF and LigF-
2 in the LigF-type family. The LigF residues are from S. paucimobilis
(BAA02031.1) and the
LigF-2 residues are from N. aromaticivorans (ABD27301.1). Numbering is
according to the
S. paucimobilis sequence (BAA02031.1) in the PRALINE alignment file (gaps not
included.
EXAMPLE 6
[00208] This example provides additional sequences for a second round of
assays, the
sequences containing the 3 conserved functional domains described herein for
the GST C
- 58 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
family of proteins, and belong to the beta-etherase LigE subfamily. Table 11
lists nine (9)
additional sequences having identities of 51% -73% at the amino acid level
that were
identified in the SwissProt database using the S. paucimobilis LigE sequence
(P27457.3) as
the query. The bioinformatics information suggests that these 9 sequences are
excellent
candidates for the next round of synthesis, cloning, expression and testing
for the desired
biochemical functions using the methods described herein.
[00209] Table 11.
Annotation Accession # Identity to S.
SwissProt/GenBank paucimobilis
LigE
polypeptide WO
7 Dianthus caryophyllus; Glutathione S P28342.1/121736 59
transf erase
8 Euforbua esula; Glutathione S P57108.1/11132235 51
transf erase
9 Zea mays; Glutathione S transf erase P04907.4/1170090 70
Pseudomonas aeruginosa; P57109.1/11133449 58
Maleylacetoacetate isomerase
11 Zea mays; Glutathione S transf erase P46420.2/1170092 63
12 Arabidopsis thaliana; Glutathione S Q8L7C9.1/75329755 61
transf erase
13 Arabidopsis thaliana; Glutathione S P42769.1/1170093 73
transf erase
14 Oryza sativa Japonica Group; 065857.2/57012737 59
Probable Glutathione S transferase
Oryza sativa Japonica Group; 082451.3/57012739 62
Probable Glutathione S transf erase
[00210] The nucleotide and amino acid sequences in Table 11 are
incorporated herein by
reference in their entirety through the GenBank Accession Numbers.
EXAMPLE 7
[00211] This example describes how native lignin core structures can be
hydrolyzed by
the action of C alpha-dehydrogenases, beta-etherases, and glutathione-
eliminating
enzymes.
[00212] FIG. 5 illustrates beta-aryl-ether compounds to be tested as
substrates
representing native lignin structures, according to some embodiments. While
MUAV was
used as a model substrate in the identification of novel beta-etherase
enzymes, additional
aryl-ether compounds such as those shown in FIG. 5 might be used to assess
substrate
- 59 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
specificities of the beta-etherases towards dimers and trimers of aromatic
compounds
containing the beta-aryl ether linkage and representative of native lignin
structures. Higher
order oligomers of molecular weights <2000 might be synthesized and tested as
well. The
compounds might be obtained by custom organic synthesis, as for the
fluorescent substrate
MUAV.
[00213] FIG. 6 illustrates pathways of guaiacylglycerol-6-guaiacyl ether
(GGE)
metabolism by S. paucimobilis, according to some embodiments. Enzymes in
addition to
LigE/F-like beta etherases might be required to hydrolyze native lignin core
structures. The
model 6-aryl ether compound guaiacylglycerol- 13 -guaiacyl ether (GGE) is
believed to
contain the main chemical linkages present in native lignin, including the
hydroxyl, aryl-ether
and methoxy functionalities. The biotransformation of GGE to the lignin
monomer beta-
hydroxypropiovanillone (beta-HPV) is partially understood for S. paucimobilis,
and proposed
to occur via the action of 3 separate enzymes in a step-wise manner. The ligD
gene product
encodes a Q alpha-dehydrogenase which oxidizes GGE to a-(2-methoxyphenoxy)-6-
hydroxypropiovanillone (MPHPV); the ether bond of MPHPV is cleaved by the beta-
etherase
activities of the ligE and ligFgene products to yield the lignin monomer
guaiacol, and a-
glutathionylhydroxypropiovanillone (GS-HPV), respectively. The ligG gene
product encodes
a glutathione (GSH)-eliminating glutathione S transf erase (GST) which
catalyzes the
elimination of glutathione (GSH) from GS-HPV to yield the lignin
hydroxypropiovanillone
(HPV).
[00214] While the LigE and LigF polypeptides, or similar ones described
herein, might be
sufficient to hydrolyze native lignin structures, it would be useful to
discover novel C alpha
dehydrogenases (S. paucimobilis LigD homologs) and glutathione (GSH)-
eliminating
glutathione S transferases (S. paucimobilis LigG homologs) for industrial
applications. The
enzyme discovery programs might be conducted by methods similar to those
described
herein. The detection of lignin substrates, intermediates, and products of
biochemical
reactions might be measured following filtration, and the extraction of
substrates and
products into ethyl acetate. Substrates and products might be separated using
reverse
phase HPLC conditions with a 018 column developed with a gradient solvent
system of
methanol and water, and detected at 230nm or 254nm.
[00215] Table 12 lists potential C alpha-dehydrogenase polypeptide
sequences, the
LigD-type, for use in conjunction with beta etherases including, but not
limited to, LigE/F.
The sequences were identified using bioinformatic methods, such as those
taught herein.
- 60 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
These C alpha-dehydrogenases are classified in the ODD as short-chain
dehydrogenase/reductases (SDRs) and are a functionally diverse family of
oxidoreductases
that have a single domain with a structurally conserved Rossmann fold
(alpha/beta folding
pattern with a central beta-sheet), an NAD(P)(H)-binding region, and a
structurally diverse
0-terminal region. Classical SDRs are typically about 250 residues long, while
extended
SDRs are approximately 350 residues. Sequence identity between different SDR
enzymes
are typically in the 15-30% range, but the enzymes share the Rossmann fold NAD-
binding
motif and characteristic NAD-binding and catalytic sequence patterns.
[00216] Without intending to be bound by any theory or mechanism of action,
these
enzymes are thought to catalyze a wide range of activities including the
metabolism of
steroids, cofactors, carbohydrates, lipids, aromatic compounds, and amino
acids, and act in
redox sensing. Classical SDRs have an TGXXX[AG]XG cofactor binding motif and a
YXXXK active site motif, with the Tyr residue of the active site motif serving
as a critical
catalytic residue (Tyr-151, human prostaglandin dehydrogenase (PGDH)
numbering). In
addition to the Tyr and Lys, there is often an upstream Ser (Ser-138, PGDH
numbering)
and/or an Asn (Asn-107, PGDH numbering) contributing to the active site; while
substrate
binding is in the 0-terminal region, which determines specificity.
[00217] Without intending to be bound by any theory or mechanism of action,
the
standard reaction mechanism is thought to be a 4-pro-S hydride transfer and
proton relay
involving the conserved Tyr and Lys, a water molecule stabilized by Asn, and
nicotinamide.
Extended SDRs have additional elements in the C-terminal region, and typically
have a
TGXXGXXG cofactor binding motif. Complex (multidomain) SDRs such as
ketoreductase
domains of fatty acid synthase can have a GGXGXXG NAD(P)-binding motif and an
altered
active site motif (YXXXN). Fungal type ketoacyl reductases can have a
TGXXXGX(1-2)G
NAD(P)-binding motif. Some atypical SDRs are thought to have lost catalytic
activity and/or
have an unusual NAD(P)-binding motif and missing or unusual active site
residues.
Reactions catalyzed within the SDR family can include isomerization,
decarboxylation,
epimerization, C=N bond reduction, dehydratase activity, dehalogenation, Enoyl-
CoA
reduction, and carbonyl-alcohol oxidoreduction.
- 61 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00218] Table 12.
Species GenBank Accession Identity/Similarity to
Numbers S. paucimobilis LigD
polypeptide (%)
1 N. aromaticivorans YP495487.1 78/88
2 N. aromaticivorans YP496072.1 39/58
3 N. aromaticivorans YP496073.1 39/59
4 N. aromaticivorans YP495984.1 35/56
N. aromaticivorans YP497149.1 38/58
[00219] The nucleotide and amino acid sequences in Table 12 are
incorporated herein by
reference in their entirety through the GenBank Accession Numbers.
[00220] Table 13 lists potential LigG (glutathione-eliminating)-like enzyme
sequences for
use in conjunction with beta etherases including, but not limited to, LigE/F.
The sequences
were identified using bioinformatic methods, such as those taught herein.
These might be
utilized in conjunction with C-alpha dehydrogenases, and/or with LigE/F-like
beta-etherases.
The LigG-like proteins are annotated in the CDD as glutathione S-transf erase
(GST)-like
proteins with similarity to the GST C family, the GST-N family, and the
thioredoxin (TRX)-
like superfamily of proteins containing a TRX fold.
[00221] Table 13.
Species GenBank Accession Identity/Similarity to
Numbers S. paucimobilis LigG
polypeptide (%)
1 N. aromaticovorans YP 498160.1 23/41
2 A. vinelandii DJ YP 002798340 32/50
[00222] The nucleotide and amino acid sequences in Table 13 are
incorporated herein by
reference in their entirety through the GenBank Accession Numbers.
EXAMPLE 8
[00223] This example describes the creation of a novel recombinant
microbial system for
the conversion of lignin oligomers to monomers. Azotobacter vinelandii strain
BAA-1303
DJ, for example, might be transformed with beta-etherase encoding genes from
N.
aromaticovorans with the objective of creating a lignin phenolics-tolerant A.
vinelandii strain
capable of converting lignin oligomers to monomers at high yields in
industrial processes.
Table 14 lists additional A. vinelandii strains that might be used as host
strains for beta-
- 62 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
etherase gene expression, for example, by their strain designation and
American Type
Culture Collection (ATCC) number.
[00224] Table 14.
Strain ATCC Strain ATCC Strain ATCC
# # #
Designation Number Designation Number Designation Number
1 Wisconsin 0 12518 8 Ad116 17962 14 B-6 7489
2 3a 12837 9 NRS 16 25308 15 B-9 7492
3 AV-3 13266 10 UWD 478 16 37 9046
4 AV-4 13267 11 113 53800 17 V1 7496
AV-5 13268 12 B-1 7484 18 3 9047
6 OP 13705 13 B-4 7487
7 135 [VKM B-
-
547]
[00225] The heterologous production of beta etherases, Ca dehydrogenases,
and other
enzymes for the production of lignin monomers and aromatic products in A.
vinelandii might
be achieved using the expression plasmid system described herein. The broad
host range
multicopy plasmid pKT230 (ATCC) encoding streptomycin resistance might be used
for
gene cloning. Genes can be synthesized by methods describe above, and cloned
into the
Smal site of pKT230. The nifH promoter from A. vinelandii strain BAA 1303 DJ
can be used
to control gene expression.
[00226] A. vinelandii strain BAA 1303 DJ might be transformed with pKT230
derivatives
using electroporation of electrocompetent cell (Eppendorf method), or by
incubation of
plasmid DNA with chemically competent cells prepared in TF medium (1.9718g of
MgSO4,
0.0136 g of CaSO4, 1.1 g of CH3COONH4, 10 g of glucose, 0.25 g of KH2PO4, and
0.55 g
of K2HPO4 per liter). Transformants might be selected by screening for
resistance to
streptomycin. Gene expression might be induced by cell growth under nitrogen-
free Burk's
medium (0.2 g of MgSO4, 0.1 g of CaSO4, 0.5 g of yeast extract, 20 g of
sucrose, 0.8 g of
K2HPO4, and 0.2 g of KH2PO4, with trace amounts of FeCI3 and Na2Mo04, per
liter).
[00227] The biochemical activity of a newly-discovered beta-etherase enzyme
functionally expressed in A. vinelandii strain BAA 1303 DJ can be tested using
methods
known to one of skill, such as the methods provided herein. Biochemical
activity assays for
beta-etherase function, and for total protein might be performed as described
herein.
- 63 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
EXAMPLE 9
[00228] This example describes the design and use of recombinant
Azotobacter strains
heterologously expressing enzymes for the production of high value aromatic
compounds
from lignin core structures. Table 15 lists a few examples of aromatic
compounds that might
be produced by the microbial platforms described herein.
- 64 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00229] Table 15.
Chemical Market Volume Market Value Uses
Product (metric ton/year) ($/lb)
Catechol Antioxidant: 4-tert-
butylcatechol.
H = 0Flavors:
30x103 2.34 piperonal;
veratrol.
OH
Insecticides:
carbofuran; propoxur.
Vanillin
H
Flavor agent. Precursor
20x103 6.12 for pharmaceutical
H3C0 methyldopa.
OH
2,4-Diaminotoluene
NH2
Precursor to toluene
3x106 1.65 diisocyanates for
H2N urethane polymers.
CH3
Salicylic acid Precursor to analgesic
drug acetylsalicylic
acid. Precursor to
HO 1.6x10 3 (US) 3.92
fragrances: amyl and
COOH methyl esters of
salicylic acid.
Aminosalicylic acid
NH2
57.38 Tuberculosis drug.
HO
COOH
ortho-Cresol Precursors to
= 38x103 0.8 herbicides: 4-
chloro-2-
methylphenoxyacetic
HO acid; 2-(4-chloro-2-
CH3 methylphenoxy)-
propionic acid.
[00230] One example of a microbial process to a commercial aromatic
compound might
be the production of catechol from lignin-derived phenolic compounds. Catechol
might be
produced from guaiacol using an A. vinelandii or A. chroococcum strain
engineered with
enzymes including beta-etherases and demethylases, or demethylase enzymes
alone.
- 65 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
Azotobacter strains might be engineered to express the heterologous enzymes by
the
methods described herein.
[00231] FIG. 7 illustrates an example of a biochemical process for the
production of
catechol from lignin oligomers, according to some embodiments. The biochemical
processes leading to aromatic products such as catechol might be designed as 3
unit
operations described below:
[00232] i) Fractionation of soluble lignin - Concentration or partial
purification of
soluble biorefinery lignin fractions or phenolic streams using methods known
to one of skill.
[00233] ii) Biotransformation - The biotransformation of the phenolic
substrate
stream might be carried out in a fed-batch bioprocess using Azotobacter
strains engineered
to specifically and optimally convert specific lignin-derived phenolic
substrates to the final
product, such as catechol. Corn steep liquor might be used the base medium
used in the
biotransformations. The phenolic stream might be introduced in fed-batch mode,
at
concentrations that will be tolerated by the strains.
[00234] iii) Product separation - The product, such as catechol, might
be purified
from the aqueous culture broths using standard chemical separation methods
such as
liquid-liquid extractions (LLE) with solvents of varying polarities applied in
a sequential
manner.
[00235] Additional examples of designed biochemical routes to aromatic
products are
described below:
[00236] i) lignin-derived syringic acid might be converted to gallic
acid via a 2-step
biochemical conversion using aryl aldehyde oxidases and demethylases.
[00237] ii) Lignin-derived vanillin might be converted to
protocatechuic acid via a 2-
step biochemical conversion using aryl aldehyde oxidases and demethylases.
[00238] iii) Lignin-derived vanillin might be converted to catechol via
a 3-step
biochemical conversion using aryl aldehyde oxidases, aromatic decarboxylases,
and
demethylases.
[00239] iv) Lignin-derived 2-methoxytoluene might be converted to the
urethane
precursor 2,4-diaminotoluene via a 4-step biochemical conversion using
demethylases,
ferulate-5-hydroxylases, 2,4-nitrophenol oxidoreductases, and 2,4-nitrobenzene
reductases.
- 66 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00240] In each case, the specific enzymes might be engineered into A.
vinelandii or A.
chroococcum strains, for example, and the process might be performed using
unit
operations similar to those described herein for the biochemical production of
catechol.
[00241] FIG. 8 illustrates an example of a biochemical process for the
production of
vanillin from lignin oligomers, according to some embodiments. Vanillin can be
used as a
flavoring agent, and as a precursor for pharmaceuticals such as methyldopa.
Synthetic
vanillin, for example, can be produced from petroleum-derived guaiacol by
reaction with
glyoxylic acid. Vanillin, however, can also be produced from lignin-derived p-
hydroxypropiovanillone (13-HPV) according to the process scheme indicated in
FIG. 8. A 2-
step biochemical route to vanillin from 13-HPV can be achieved using the
enzymes 2,4-
dihydroxyacetophenone oxidoreductase, and vanillin dehydrogenase or carboxylic
acid
reductases, engineered into A. vinelandii.
[00242] FIG. 9 illustrates an example of a biochemical process for the
production of 2,4-
diaminotoluene from lignin oligomers, according to some embodiments. Toluene
diisocyanate (TDI) can be used in the manufacture of polyurethanes. For
example, 2,4-
diaminotoluene (2,4-DAT) is the key precursor to TD I. Diaminotoluenes can be
produced
industrially by the sequential nitration of toluene with nitric acid, followed
by the reduction of
the dinitrotoluenes to the corresponding diaminotoluenes. Both nitration and
reduction
reactions yield mixtures of toluene isomers from which the 2,4-DAT isomer is
purified by
distillation. The conversion of lignin-derived 2-methoxytoluene to 2,4-DAT can
be achieved
according to the process scheme outlined in FIG. 9. 2-methoxytoluene can be be
converted
to 2,4-DAT by A. vinelandii engineered with 4 enzymes to specifically
demethylate,
hydroxylate, nitrate and aminate methoxytoluene.
[00243] FIG. 10 illustrates process schemes for additional product targets
that include
ortho-cresol, salicylic acid, and aminosalicylic acid, for the production of
valuable chemicals
from lignin oligomers, according to some embodiments. These chemicals, as with
the
others, have traditionally been obtained from the problematic petrochemical
processes. A
few of the process schemes for producing these chemicals using the teachings
herein,
based on guaiacol or 2-methoxytoluene, are shown schematically in FIG. 10.
Designed
biochemical routes, combined with the remarkable phenolics-tolerance traits of
Azotobacter
strains are proposed for conversions of lignin structures to industrial and
fine chemicals.
- 67 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
EXAMPLE 10
[00244] This example describes potential LigE-, LigF-, LigG-, and LigD-type
polypeptides,
and the genes encoding them. The potential polypeptides were identified using
bioinformatic methods, such as those taught herein.
[00245] As described above, the query sequences in the initial pass for the
LigE-type and
LigF-type were Sphingomonas paucimobilis sequences, such as those discussed in
Masai,
E., et al. Likewise, the query sequences for the LigG-type and LigD-type were
also
Sphingomonas paucimobilis sequences, such as those discussed in Masai. The
following
sequences were used in the initial pass for all queries:
[00246] LigE, from Accession No BAA2032.1, is listed herein as SEQ ID NO:1
for the
protein and SEQ ID NO:2 for the gene.
[00247] LigF, from Accession No BAA2031.1 (P30347.1), is listed herein as
SEQ ID
NO:513 for the protein and SEQ ID NO:514 for the gene.
[00248] LigG, from Accession No Q9Z339.2, is listed herein as SEQ ID NO:733
for the
protein and SEQ ID NO:734 for the gene.
[00249] LigD, from Accession No Q01198.1, is listed herein as SEQ ID NO:777
for the
protein and SEQ ID NO:778 for the gene.
[00250] The following sequences were used in a modified query to further
refine the LigE-
type and LigF-type, and the query sequences were the LigE-1 and LigF-2 that
showed the
surprising and unexpected results shown in FIG. 4:
[00251] LigE-1, from Accession No ABD26841.1, is listed herein as SEQ ID
NO:101 for
the protein and SEQ ID NO:102 for the gene.
[00252] LigF-2, from Accession No ABD27301.1, is listed herein as SEQ ID
NO:541 for
the protein and SEQ ID NO:542 for the gene.
[00253] Table 16 lists SEQ ID NOs:1-246, which are potential protein
sequences of the
LigE-type, as well as a respective gene sequence encoding the protein. Table
17 lists SEQ
ID NOs:247-576, which are potential protein sequences of the LigF-type, as
well as a
respective gene sequence encoding the protein. Table 18 lists SEQ ID NOs:577-
776, which
are potential protein sequences of the LigG-type, as well as a respective gene
sequence
- 68 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
encoding the protein. Table 19 lists SEQ ID NOs: 777-976, which are potential
protein
sequences of the LigD-type, as well as a respective gene sequence encoding the
protein.
[00254] Bioinformatic methods, such as those described herein, can be used
to suggest
an efficient order of experimentation to identify additional potential enzymes
for use with the
teachings provided herein. Moreover, mutations and amino acid substitutions
can be used
to test affects on enzyme activity to further understand the structure of the
most active
proteins with respect to the enzyme functions sought by teachings provided
herein.
- 69 -

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
[00255] Table 16.
- 70 -

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
1 2 BAA02032.1 Sphingomonas paucimobilis LIGE
3 4 BAJ11989.1 beta-etherase [Sphingobium sp. SYK-6] LIGE
glutathione S-transferase domain-containing
LIGE
6 EFV85608.1 protein [Achromobacter xylosoxidans C54]
7 8
EFW42705.1 predicted protein [Capsaspora owczarzaki ATCC LIGE
Glutathione S-transferase domain-containing
LIGE
9 10 EGE55257.1 protein [Rhizobium etli CNPAF512]
glutathione S-transferase domain-containing
LIGE
11 12 EGP48556.1 protein [Achromobacter xylosoxidans AXX-A]
13 14 EGP57475.1 lignin degradation protein [Agrobacterium LIGE
Glutathione S-transferase [Rhodotorula glutinis LIGE
16 EGU12703.1 ATCC 204091]
glutathione S-transferase domain-containing
LIGE
17 18 EGU56510.1 protein [Vibrio tubiashii ATCC 19109]
hypothetical protein pTi-SAKURA_p086
LIGE
[Agrobacterium tumefaciens] >dbjIBAA87709.1I
19 20 NP 053324.1 tiorf84 [Agrobacterium tumefaciens]
lignin beta-ether hydrolase [Mesorhizobium loti LIGE
MAFF303099] >dbj1BAB54276.11lignin beta-
21 22 NP 108131.1 ether hydrolase [Mesorhizobium loti
lignin degradation protein [Agrobacterium
LIGE
tumefaciens str. C58] >gbIAAK86925.21lignin
23 24 NP 354140.2 degradation protein [Agrobacterium tumefaciens
putative BETA-etherase (BETA-aryl ether
LIGE
cleaving enzyme) protein [Sinorhizobium meliloti
1021] >embICAC45742.11 Putative beta-
etherase (beta-aryl ether cleaving enzyme)
protein [Sinorhizobium meliloti 1021]
>gbIAEG03720.11Glutathione S-transferase
domain protein [Sinorhizobium meliloti BL225C]
26 NP 385269 1 >gbIAEH79753.11 putative BETA-etherase
= . . .
ligninase [Bradyrhizobium japonicUm USDA 110] LIGE
>dbjIBAC52692.1I ligE [Bradyrhizobium
27 28 NP 774067.1 japonicum USDA 110]
putative lignin beta-ether hydrolase
LIGE
[Rhodopseudomonas palustris CGA009]
29 30 NP 949676.1 >embICAE29781.11 putative lignin beta-ether
-71-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
RecName: Full=Beta-etherase; AltName: LIGE
Full=Beta-aryl ether cleaving enzyme
>gbIAAA25878.1I beta-etherase [Sphingomonas
31 32 P27457.3 paucimobilis] >dbjIBAA02032.11beta-etherase
hypothetical protein SCHCODRAFT_85860 LIGE
[Schizophyllum commune H4-8]
33 34 P_003028922. >gbIEF194019.11 hypothetical protein
hypothetical protein SCHCODRAFT_57691 LIGE
[Schizophyllum commune H4-8]
35 36 P_003030384. >gblEF195481.11 hypothetical protein
hypothetical protein SCHCODRAFT_81614 LIGE
[Schizophyllum commune H4-8]
37 38 P_003033715. >gbIEF198812.11 hypothetical protein
hypothetical protein NECHADRAFT_55532 LIGE
[Nectria haematococca mpVI 77-13-4]
>gblEEU35500.11 hypothetical protein
39 40 P_003041213. NECHADRAFT_55532 [Nectria haematococca
41 42 XP 382462.1 hypothetical protein FG02286.1 [Gibberella zeae
LIGE
putative glutathione S-transferase (GST) LIGE
[Bradyrhizobium sp. 0R5278]
43 44 P001207860. >embICAL79645.11 putative glutathione 5-
glutathione S-transferase domain-containing LIGE
protein [Acidiphilium cryptum JF-5]
>gbIABQ32287.11Glutathione S-transferase, N-
45 46 P_001236206. terminal domain protein [Acidiphilium cryptum
JF=
putative glutathione S-transferase LIGE
[Bradyrhizobium sp. BTAi1] >gbIABQ33995:11
47 48P 001237901. putative glutathione S-transferase (GST)
hypothetical protein Swit_1652 [Sphingomonas LIGE
wittichii RW1] >gbIABQ68015.11 hypothetical
49 50 (P_001262153. protein Swit 1652 [Sphingomonas wittichii RW1]
glutathione S-transferase domain-containing LIGE
protein [Sinorhizobium medicae WSM419]
>gbIABR59630.1I Glutathione S-transferase
51 52 (P_001326465. domain [Sinorhizobium medicae W5M419]
glutathione S-transferase domain-containing LIGE
protein [Parvibaculum lavamentivorans DS-1]
>gbIABS63563.1I Glutathione S-transferase
53 54 (P_001413220. domain [Parvibaculum lavamentivorans DS-1]
-72-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase [Azorhizobium LIGE
caulinodans ORS 571] >dbjIBAF89264.11
55 56 (P 001526182. glutathione S-transferase [Azorhizobium
lignin degradation protein [Sorangium cellulosum LIGE
'So ce 56'] >embICAN96036.11lignin
57 58P 001616516. degradation protein [Sorangium cellulosum 'So
glutathione S-transferase domain-containing LIGE
protein [Methylobacterium sp. 4-46]
59 60 (P 001772944. >gblACA20610.11 Glutathione S-transferase
glutathione S-transferase domain-containing LIGE
protein [Beijerinckia indica subsp. indica ATCC
9039] >gblACB95969.11Glutathione 5-
61 62 P_001833458. transferase domain [Beijerinckia indica subsp.
beta-aryl ether cleaving enzyme, lignin LIGE
degradation protein [Rhizobium etli CIAT 652]
>gblACE90517.11 beta-aryl ether cleaving
63 64 P_001977695. enzyme, lignin degradation protein [Rhizobium
glutathione S-transferase domain-containing LIGE
protein [Rhodopseudomonas palustris TIE-1]
>gblACF03309.11Glutathione S-transferase
65 66 (P_001993784. domain [Rhodopseudomonas palustris TIE-1]
glutathione S-transferase domain [Rhizobium LIGE
leguminosarum by. trifolii W5M2304]
>gblAC154372.11Glutathione S-transferase
67 68 P_002280598. domain [Rhizobium leguminosarum by. trifolii
glutathione S-transferase [Oligotropha LIGE
carboxidovorans 0M5] >reflYP_004631892.11
beta etherase [Oligotropha carboxidovorans
0M5] >gblAC194284.11glutathione S-
transferase [Oligotropha carboxidovorans 0M5]
>gbIAEI02075.11 putative beta etherase
69 70 P_002290149. [Oligotropha carboxidovorans 0M4]
-
glutathione S-transferase domain-containing LIGE
protein [Methylocella silvestris BL2]
71 72P 002362903. >gblACK51541.11glutathione S-transferase
glutathione S-transferase domain-containing LIGE
protein [Methylobacterium nodulans ORS 2060]
>gblACL61802.11Glutathione S-transferase
73 74 P_002502105. domain protein [Methylobacterium nodulans
-73-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
lignin degradation protein [Agrobacterium vitis LIGE
S4] >gblACM36110.11lignin degradation protein
75 76 (P_002549116. [Agrobacterium vitis S4]
glutathione S-transferase-like protein LIGE
[Azotobacter vinelandii DJ] >gblAC076830:11
77 78 P002797805. Glutathione S-transferase-like protein
putative lignin beta-ether hydrolase LIGE
[Sinorhizobium fredii NGR234]
79 80 P002825455. >gblACP24702.11 putative lignin beta-ether
glutathione S-transferase domain protein LIGE
[Rhizobium leguminosarum by. trifolii WSM1325]
>gblACS55517.11Glutathione S-transferase
81 82 P_002975056. domain protein [Rhizobium leguminosarum by.
lignin degradation protein [Agrobacterium sp. LIGE
H13-3] >gbIADY64039.11lignin degradation
83 84 (P_004278359. protein [Agrobacterium sp. H13-3]
putative beta-etherase [Acidiphilium multivorum LIGE
AlU301] >dbjIBAJ82791.11 putative beta-
85 86 (P_004285673. etherase [Acidiphilium multivorum AlU301]
glutathione S-transferase-like protein LIGE
[Pseudomonas mendocina NK-01]
87 88 P004378290. >gbIAEB56538.1I glutathione S-transferase-like
glutathione S-transferase-like protein LIGE
[Novosphingobium sp. PP1Y]
89 90 P004533906. >embICCA92088.1I glutathione 5-transferase-
glutathione S-transferase domain-containing LIGE
protein [Sinorhizobium meliloti AK83]
91 92 P004548326. >gbIAEG52712.11Glutathione S-transferase
glutathione S-transferase domain-containing LIGE
protein [Mesorhizobium opportunistum
W5M2075] >gbIAEH89616.11Glutathione 5-
93 94 (P 004613710. transferase domain protein [Mesorhizobium
putative lignin beta-etherase [Colwellia LIGE
psychrerythraea 34H] >gbIAAZ24120.11 putative
95 96 YP 269568.1 lignin beta-etherase [Colwellia psychrerythraea
beta-aryl ether cleaving enzyme, lignin LIGE
degradation protein [Rhizobium etli CFN 42]
>gbIABC90274.1I beta-aryl ether cleaving
97 98 YP 469001.1 enzyme, lignin degradation protein [Rhizobium
-74-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase-like protein LIGE
[Rhodopseudomonas palustris HaA2]
99 100 YP 487746.1 >gbIABD08835.11Glutathione S-transferase-like
ik_rm!!!!!!._.
glutathione S-transferase-like protein LIGE
[Novosphingobium aromaticivorans DSM 12444]
>gbIABD26841.1I glutathione S-transferase-like
mgantat 102 YP 497675.1 protein [Novosphingobium aromaticivorans DSM
glutathione S-transferase-like protein LIGE
[Rhodopseudomonas palustris BisB18]
103 104 YP 533979.1 >gbIABD89660.1Iglutathione S-transferase-like
glutathione S-transferase-like protein LIGE
[Chromohalobacter salexigens DSM 3043]
>gbIABE60032.1I glutathione S-transferase-like
105 106 YP 574731.1 Protein [Chromohalobacter salexigens DSM
glutathione S-transferase-like protein LIGE
[Trichodesmium erythraeum IMS101]
107 108 YP 723508.1 >gbIABG53035.1Iglutathione S-transferase-like
etherase [Rhizobium leguminosarum by. viciae LIGE
3841] >embICAK07074.11 putative etherase
109 110 YP_767183.1 [Rhizobium leguminosarum by. viciae 3841]
glutathione S-transferase [Rhodopseudomonas LIGE
palustris BisA53] >gbIABJ08111.11Glutathione
111 112 YP 783091.1 5-transferase [Rhodopseudomonas palustris
glutathione S-transferase domain-containing LIGE
protein [Paracoccus den itrificans PD1222]
>gbIABL69699.1I Glutathione S-transferase, N-
113 114 YP 915395.1 terminal domain [Paracoccus denitrificans
putative beta-etherase (beta-aryl ether cleaving LIGE
enzyme) protein [Phaeobacter gallaeciensis
BS107] >gbIEDQ11875.11 putative beta-
115 116 ZP 02146530.1 etherase (beta-aryl ether cleaving enzyme)
putative beta-etherase (beta-aryl ether cleaving LIGE
enzyme) protein [Phaeobacter gallaeciensis
2.10] >gbIEDQ08644.1I putative beta-etherase
117 118 ZP 02149699.1 (beta-aryl ether cleaving enzyme) protein
putative beta-etherase (beta-aryl ether cleaving LIGE
enzyme) protein [Hoeflea phototrophica DFL-43]
>gbIEDQ33834.11 putative beta-etherase (beta-
119 120 ZP 02166231.1 aryl ether cleaving enzyme) protein [Hoeflea
-75-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase-like protein [alpha LIGE
proteobacterium BALI 99] >gbIEDP62276.11
121 122 zp 02190934: glutathione S-transferase-like protein [alpha
123 124 ZP 03503368. Glutathione S-transferase domain [Rhizobium
LIGE
125 126 ZP 03507162.' Glutathione S-transferase domain [Rhizobium
LIGE
127 128 ZP 03513891.' Glutathione S-transferase domain [Rhizobium
LIGE
129 130 ZP 03519388.' Glutathione S-transferase domain [Rhizobium
LIGE
131 132 ZP 03520502.' putative etherase [Rhizobium etli GR56]
LIGE
glutathione S-transferase, N-terminal domain LIGE
[Pseudovibrio sp. JE062] >gblEEA94709:11
133 134 zp 05084767: glutathione S-transferase, N-terminal domain
lignin degradation protein [Achromobacter LIGE
piechaudii ATCC 43553] >gbIEFF74366.1I lignin
135 136 ZP 06688746: degradation protein [Achromobacter piechaudii
glutathione S-transferase family protein LIGE
[Roseomonas cervical is ATCC 49957]
>gbIEFH10151.11 glutathione S-transferase
137 138 ZP 06898146: family protein [Roseomonas cervicalis ATCC
Glutathione S-transferase domain protein [Afipia LIGE
sp. 1NLS2] >gbIEF151229.11Glutathione 5-
139 140 ZP_07027473.' transferase domain protein [Afipia sp. 1NLS2]
beta-etherase [Ahrensia sp. R2A130] LIGE
141 142 ZP 07373940.' >gbIEFL90585.1I beta-etherase [Ahrensia sp.
Glutathione S-transferase [gamma LIGE
proteobacterium IMCC1989] >gblEGG95341 Al
143 144 ZP 08328512: Glutathione S-transferase [gamma
lignin degradation protein [Agrobacterium sp. LIGE
ATCC 31749] >gb1EGL63395.11 lignin
145 146 ZP 08529965.' degradation protein [Agrobacterium sp. ATCC
lignin beta-ether hydrolase [Bradyrhizobiaceae LIGE
bacterium SG-6C] >gblEGP10168.11 lignin beta-
147 148 ZP 08627134: ether hydrolase [Bradyrhizobiaceae bacterium
Glutathione S-transferase domain-containing LIGE
protein [Acidiphilium sp. PM] >gblEG096849.11
149 150 ZP 08631370: Glutathione S-transferase domain-containing
Glutathione S-transferase domain-containing LIGE
protein [Acidiphilium sp. PM] >gblEG093307.11
151 152 ZP 08634908: Glutathione S-transferase domain-containing
-76-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase domain-containing LIGE
protein [Halomonas sp. TD01] >gblEGP21558.11
153 154 zp 08635074.1 glutathione S-transferase domain-containing
hypothetical protein SERLA73DRAFT_115219 LIGE
[Serpula lacrymans var. lacrymans S7.3]
>gblEG019163.11 hypothetical protein
155 156 EGN93792.1 SERLADRAFT_453680 [Serpula lacrymans var.
hypothetical protein SERLA73DRAFT_188253 LIGE
[Serpula lacrymans var. lacrymans S7.3]
>gblEG019875.11 hypothetical protein
157 158 EGN94392.1 SERLADRAFT_478300 [Serpula lacrymans var.
hypothetical protein SERLA73DRAFT_186005 LIGE
[Serpula lacrymans var. lacrymans S7.3]
>gblEG021854.11 hypothetical protein
159 160 EGN96317.1 SERLADRAFT_474829 [Serpula lacrymans var.
hypothetical protein SERLA73DRAFT_185168 LIGE
[Serpula lacrymans var. lacrymans S7.3]
>gblEG022516.11 hypothetical protein
161 162 EGN96924.1 SERLADRAFT_473468 [Serpula lacrymans var.
hypothetical protein SERLA73DRAFT_107446 LIGE
[Serpula lacrymans var. lacrymans S7.3]
>gblEG025928.11 hypothetical protein
163 164 EG000367.1 SERLADRAFT_415302 [Serpula lacrymans var.
conserved hypothetical protein [Aspergillus LIGE
terreus NIH2624] >gblEAU33805.11 conserved
165 166 P_001215222. hypothetical protein [Aspergillus terreus
hypothetical protein AOR_1_322094 [Aspergillus LIGE
oryzae RIB40] >dbjIBAE62801.11 unnamed
167 168 P_001823934. protein product [Aspergillus oryzae RIB40]
hypothetical protein CC1G_07903 [Coprinopsis LIGE
cinerea okayama7#130] >gblEAU82621.11
169 170 P_001839188. hypothetical protein CC1G_07903 [Coprinopsis
predicted protein [Laccaria bicolor 5238N-H82] LIGE
>gbIEDR03530.1I predicted protein [Laccaria
171 172 P_001885678. bicolor 5238N-H82]
conserved hypothetical protein [Penicillium LIGE
marneffei ATCC 18224] >gblEEA19427.11
173 174 P_002152364. conserved hypothetical protein [Penicillium
-77-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
conserved hypothetical protein [Aspergillus LIGE
flavus NRRL3357] >gblEED49097.11 conserved
175 176 P_002380998. hypothetical protein [Aspergillus flavus
hypothetical protein MPER_07394 LIGE
[Moniliophthora perniciosa FA553]
177 178 P_002392962. >gblEEB93892.11 hypothetical protein
predicted protein [Postia placenta Mad-698-R] LIGE
>gblEED86077.11 predicted protein [Postia
179 180 P_002468854. placenta Mad-698-R]
predicted protein [Postia placenta Mad-698-R] LIGE
>gblEED82308.11 predicted protein [Postia
181 182 P_002472522. placenta Mad-698-R]
Pc12g05530 [Penicillium chrysogenum LIGE
Wisconsin 54-1255] >embICAP80180.11
183 184 P_002557398. Pc12g05530 [Penicillium chrysogenum
hypothetical protein SCHCODRAFT_12387 LIGE
[Schizophyllum commune H4-8]
185 186 P_003026159. >gblEF191256.11 hypothetical protein
hypothetical protein SCHCODRAFT_111982 LIGE
[Schizophyllum commune H4-8]
187 188 P_003028923. >gblEF194020.11 hypothetical protein
Glutathione S-transferase domain-containing LIGE
protein [Cyanothece sp. PCC 7822]
189 190 P003890246. >gbIADN16971.11Glutathione S-transferase
glutathione S-transferase-like [Halomonas LIGE
elongata DSM 2581] >embICBV41472.11
191 192 P003896657. glutathione S-transferase-like [Halomonas
glutathione S-transferase [Achromobacter LIGE
xylosoxidans A8] >gbIADP17667.11glutathione
193 194 P_003980382. 5-transferase, N-terminal domain protein 4
glutathione S-transferase domain-containing LIGE
protein [Rhodopseudomonas palustris DX-1]
>gbIADU46105.11Glutathione S-transferase
195 196 'F1004110838. domain [Rhodopseudomonas palustris DX-1]
glutathione S-transferase [Mesorhizobium ciceri LIGE
biovar biserrulae WSM1271] >gbIADV13817.11
Glutathione S-transferase domain
197 198 'P 004143867. [Mesorhizobium ciceri biovar biserrulae
-78-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
conserved hypothetical protein [Congregibacter LIGE
litoralis KT71] >gblEAQ98305.11 conserved
199 200 ZP 01102591. hypothetical protein [Congregibacter litoralis
201 202 AAA87183.1 auxin-induced protein [Vigna radiata] LIGE
203 204 AAG34797.1 glutathione S-transferase GST 7 [Glycine max]
LIGE
205 206 AA069664.1 glutathione S-transferase [Phaseolus acutifolius]
LIGE
207 208 ACU24385.1 unknown [Glycine max] LIGE
209 210 ADP99065.1 glutathione S-transferase [Marinobacter LIGE
putative glutathione S-transferase [Acinetobacter LIGE
211 212 ADY82158.1 calcoaceticus PHEA-2]
213 214 BAA77215.1 beta-etherase [Sphingomonas paucimobilis] LIGE
hypothetical protein CC1G_12612 [Coprinopsis LIGE
cinerea okayama7#130] >gblEAU82225.11
215 216 1='_001839584. hypothetical protein CC1G_12612 [Coprinopsis
predicted protein [Populus trichocarpa] LIGE
217 218 P_002336443. >gblEEE73479.11 predicted protein [Populus
hypothetical protein SCHCODRAFT_59314 LIGE
[Schizophyllum commune H4-8]
219 220 P_003028624. >gbIEF193721.11 hypothetical protein
DEHA2A00660p [Debaryomyces hansenii LIGE
CB5767] >embICAG84310.11DEHA2A00660p
221 222 XP 456365.1 [Debaryomyces hansenii]
hypothetical protein [Cryptococcus neoformans LIGE
var. neoformans JEC21] >refIXP_773999.11
hypothetical protein CNBH0460 [Cryptococcus
neoformans var. neoformans B-3501A]
>gblEAL19352.11 hypothetical protein
CNBH0460 [Cryptococcus neoformans var.
223 224 XP 572781.1 neoformans B-3501A] >gbIAAW45474.11
. .
glutathione S-transferase domain-containing LIGE
protein [Acidiphilium cryptum JF-5]
>gbIABQ32287.11Glutathione S-transferase, N-
225 226 'P 001236206. terminal domain protein [Acidiphilium cryptum
JF=
putative glutathione S-transferase LIGE
[Bradyrhizobium sp. BTAi1] >gbIABQ33995:11
227 228P 001237901. putative glutathione S-transferase (GST)
hypothetical protein Swit_1652 [Sphingomonas LIGE
wittichii RW1] >gbIABQ68015.11 hypothetical
229 230 001262153. protein Swit 1652 [Sphingomonas wittichii RW1]
-79-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase domain-containing LIGE
protein [Sinorhizobium medicae WSM419]
>gbIABR59630.1I Glutathione S-transferase
231 232 (P_001326465. domain [Sinorhizobium medicae W5M419]
glutathione S-transferase domain-containing LIGE
protein [Parvibaculum lavamentivorans DS-1]
>gbIABS63563.1I Glutathione S-transferase
233 234 (P_001413220. domain [Parvibaculum lavamentivorans DS-1]
glutathione S-transferase [Azorhizobium LIGE
caulinodans ORS 571] >dbjIBAF89264.11
235 236 P001526182. glutathione S-transferase [Azorhizobium
glutathione S-transferase [Synechococcus LIGE
elongatus PCC 6301] >reflYP_399807.11
glutathione S-transferase [Synechococcus
elongatus PCC 7942] >dbjIBAD78939.11
glutathione S-transferase [Synechococcus
237 238 YP 171459.1 elongatus PCC 6301] >gbIABB56820.11
glutathione S-transferase-like protein [Anabaena LIGE
variabilis ATCC 29413] >gbIABA21529.11
239 240 YP 322424.1 Glutathione S-transferase-like protein
glutathione S-transferase, putative [marine LIGE
gamma proteobacterium HTCC2080]
>gblEAW41324.11glutathione S-transferase,
241 242 ZP 01625805.1putative [marine gamma proteobacterium
Glutathione S-transferase-like protein [Nodularia LIGE
spumigena CCY9414] >gblEAW44220.11
243 244 ZP 01631145. Glutathione S-transferase-like protein
[Nodularia
glutathione S-transferase [Acinetobacter LIGE
calcoaceticus RUH2202] >gblEEY78560.11
245 246 ZP 06057261.1glutathione S-transferase [Acinetobacter
-80-

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
[00256] Table 17.
-81-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase, class-phi LigF
247 248 AAB65163.1 [Solanum commersonii]
glutathione S-transferase GST 42 [Zea LigF
249 250 AAG34850.1 mays]
putative glutathione S-transferase LigF
OsGSTU7 [Oryza sativa Japonica
251 252 AAK98535.1 Group]
glutathione S-transferase [Allium cepa] LigF
253 254 AAL61612.1
Intracellular chloride channel [Medicago LigF
255 256 ABE86679.1 truncatula]
Intracellular chloride channel [Medicago LigF
257 258 ABE86683.1 truncatula]
glutathione S-transferase [Solanum LigF
259 260 ABQ96853.1 tuberosum]
glutathione-S-transferase LigF
261 262 ACF15452.1 [Phanerochaete chrysosporium]
glutathione S-transferase GSTU6 [Zea LigF
263 264 ACG44597.1 mays]
265 266 ACJ86045.1 unknown [Medicago truncatula] LigF
Probable maleylacetoacetate isomerase LigF
267 268 AC015091.1 2 [Caligus clemensi]
phi class glutathione transferase GSTF7 LigF
269 270 ADB11335.1 [Populus trichocarpa]
glutathione S-transferase [Medicago LigF
271 272 BAB70616.1 sativa]
glutathione S-transferase [Allium cepa] LigF
273 274 BAF56180.1
predicted protein [Hordeum vulgare LigF
subsp. vulgare] >dbjIBAJ99460.11
predicted protein [Hordeum vulgare
275 276 BAJ90004.1 subsp. vulgare]
glutathione S-transferase GST1 LigF
277 278 CAI51314.2 [Capsicum chinense]
hypothetical protein 0s1_34425 [Oryza LigF
279 280 EAY79299.1 sativa Indica Group]
hypothetical protein OsJ_32234 [Oryza LigF
281 282 EAZ16758.1 sativa Japonica Group]
-82-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
hypothetical protein 0s1_34397 [Oryza LigF
283 284 EEC67342.1 sativa Indica Group]
glutathione S-transferase LigF
285 286 EFV87279.1 [Achromobacter xylosoxidans C54]
hypothetical protein LigF
SERLA73DRAFT 190579 [Serpula
lacrymans var. lacrymans S7.3]
>gblEG026403.11 hypothetical protein
SERLADRAFT 463437 [Serpula
287 288 EGN92742.1 lacrymans var. lacrymans S7.9]
hypothetical protein FOXI3_13869 LigF
289 290 EGU75635.1 [Fusarium oxysporum Fo5176]
0s10g0525600 [Oryza sativa Japonica LigF
Group] >gbIAAM12493.11AC074232_20
putative glutathione S-transferase [Oryza
sativa Japonica Group]
>dbjIBAF27029.11 0s10g0525600
291 292 NP 001065115.1 [Oryza sativa Japonica Group]
0s10g0527400 [Oryza sativa Japonica LigF
Group] >gbIAAM12310.11AC091680_11
putative glutathione S-transferase [Oryza
sativa Japonica Group]
>gbIAAM12478.11AC074232_5 putative
glutathione S-transferase [Oryza sativa
Japonica Group] >gbIAAP54729.11
glutathione S-transferase GSTU6,
putative, expressed [Oryza sativa
Japonica Group] >dbjIBAF27032.11
0s10g0527400 [Oryza sativa Japonica
Group] >gbIEEE51298.1I hypothetical
protein OsJ_32225 [Oryza sativa
293 294 NP 001065118.1 Japonica Group]
-83-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
0s10g0529300 [Oryza sativa Japonica LigF
Group] >gbIAAK98546.11AF402805_1
putative glutathione S-transferase
OsGSTU18 [Oryza sativa Japonica
Group] >gbIAAM12302.11AC091680_3
putative glutathione S-transferase [Oryza
sativa Japonica Group]
>gbIAAM94529.1I putative glutathione S-
transferase [Oryza sativa Japonica
Group] >gbIAAP54753.1Iglutathione S-
transferase GSTU6, putative, expressed
[Oryza sativa Japonica Group]
>dbjIBAF27040.110s10g0529300
[Oryza sativa Japonica Group]
>gb1EAY79288.11 hypothetical protein
0s1_34414 [Oryza sativa Indica Group]
>dbjIBAG87628.11 unnamed protein
product [Oryza sativa Japonica Group]
>dbjIBAG97643.11 unnamed protein
product [Oryza sativa Japonica Group]
>dbjIBAG87189.11 unnamed protein
295 296 NP_001065126.1 product [Oryza sativa Japonica Group]
-84-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
0s10g0529900 [Oryza sativa Japonica LigF
Group] >gbIAAM12331.11AC091680_32
putative glutathione S-transferase [Oryza
sativa Japonica Group]
>gbIAAM94517.11 putative glutathione S-
transferase [Oryza sativa Japonica
Group] >gbIAAP54759.1Iglutathione S-
transferase GSTU6, putative [Oryza
sativa Japonica Group]
>dbjIBAF27046.110s10g0529900
[Oryza sativa Japonica Group]
>gblEAZ16763.11 hypothetical protein
OsJ_32239 [Oryza sativa Japonica
297 298 NP 001065132.1 Group]
L00542632 [Zea mays] LigF
>gbIAAG34835.11AF244692_1
glutathione S-transferase GST 27 [Zea
mays] >gbIACF85142.11 unknown [Zea
299 300 NP 001105627.1 mays]
glutathione S-transferase GSTU6 [Zea LigF
mays] >gbIACG46501.11glutathione 5-
301 302 NP 001152229.1 transferase GSTU6 [Zea mays]
-85-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
putative glutathione S-transferase LigF
protein [Sinorhizobium meliloti 1021]
>reflYP_004550950.1Iglutathione S-
transferase domain-containing protein
[Sinorhizobium meliloti AK83]
>embICAC41740.11 Putative glutathione
5-transferase [Sinorhizobium meliloti
1021] >gbIAEG06303.11Glutathione S-
transferase domain protein
[Sinorhizobium meliloti BL225C]
>gbIAEG55336.11Glutathione S-
transferase domain protein
[Sinorhizobium meliloti AK83]
>gbIAEH81005.11 putative glutathione S-
transferase protein [Sinorhizobium
303 304 NP 384409.1 meliloti SM11]
hypothetical protein BC1G_05597 LigF
[Botryotinia fuckeliana B05.10]
>gbIEDN24875.1I hypothetical protein
BC1G_05597 [Botryotinia fuckeliana
305 306 XP 001555922.1 B05.10]
hypothetical protein SNOG_15716 LigF
[Phaeosphaeria nodorum SN15]
>gblEAT76811.21 hypothetical protein
SNOG 15716 [Phaeosphaeria nodorum
307 308 XP 001805855.1 5N15]
predicted protein [Populus trichocarpa] LigF
>gb1EEE99635.11 predicted protein
309 310 XP 002321320.1 [Populus trichocarpa]
hypothetical protein LigF
SORBIDRAFT_03g025210 [Sorghum
bicolor] >gblEES00904.11 hypothetical
protein SORBIDRAFT_03g025210
311 312 XP 002455784.1 [Sorghum bicolor]
-86

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
hypothetical protein LigF
SORBIDRAFT_01g030860 [Sorghum
bicolor] >gblEER94604.11 hypothetical
protein SORBIDRAFT_01g030860
313 314 XP 002467606.1 [Sorghum bicolor]
PREDICTED: ganglioside-induced LigF
differentiation-associated protein 1-like
315 316 XP 002734706.1 [Saccoglossus kowalevskii]
PREDICTED: ganglioside-induced LigF
differentiation-associated protein 1-like
317 318 XP 002734707.1 [Saccoglossus kowalevskii]
PREDICTED: Glutathione S-Transferase LigF
family member (gst-42)-like
319 320 XP 002737947.1 [Saccoglossus kowalevskii]
hypothetical protein LigF
SELMODRAFT 184606 [Selaginella
moellendorffii] >gbIEFJ09414.11
hypothetical protein
SELMODRAFT 184606 [Selaginella
321 322 XP 002989538.1 moellendorffii]
glutathione S-transferase domain- LigF
containing protein [Loa loa]
>gbIEF017107.1Iglutathione S-
transferase domain-containing protein
323 324 XP 003146962.1 [Loa loa]
glutathione S-transferase domain- LigF
containing protein [Pseudomonas
mendocina ymp] >gbIABP84676.11
Glutathione S-transferase, N-terminal
domain protein [Pseudomonas
325 326 YP 001187408.1 mendocina ymp]
glutathione S-transferase domain- LigF
containing protein [Bradyrhizobium sp.
BTAi1] >gbIABQ35828.11 putative
glutathione S-transferase enzyme with
thioredoxin-like domain [Bradyrhizobium
327 328 YP 001239734.1 sp. BTAi1]
-87-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase domain- LigF
containing protein [Sphingomonas
wittichii RW1] >gbIABQ67801.11
Glutathione S-transferase, N-terminal
329 330 YP 001261939.1 domain [Sphingomonas wittichii RW1]
glutathione S-transferase domain- LigF
containing protein [Sphingomonas
wittichii RW1] >gbIABQ68928.11
Glutathione S-transferase, N-terminal
331 332 YP 001263066.1 domain [Sphingomonas wittichii RW1]
glutathione S-transferase domain- LigF
containing protein [Parvibaculum
lavamentivorans DS-1]
>gbIABS64709.11Glutathione S-
transferase domain [Parvibaculum
333 334 YP 001414366.1 lavamentivorans DS-1]
maleylacetoacetate isomerase LigF
[Parvibaculum lavamentivorans DS-1]
>gbIABS65181.11maleylacetoacetate
isomerase [Parvibaculum
335 336 YP 001414838.1 lavamentivorans DS-1]
glutathione S-transferase domain- LigF
containing protein [Caulobacter sp. K31]
>gbIABZ71793.11Glutathione S-
transferase domain [Caulobacter sp.
337 338 YP 001684291.1 K31]
glutathione S-transferase domain- LigF
containing protein [Methylobacterium sp.
4-46] >gblACA18150.11Glutathione S-
transferase domain [Methylobacterium
339 340 YP 001770584.1 sp. 4-46]
predicted glutathione S-transferase LigF
protein [Sinorhizobium fredii NGR234]
>gblACP27363.11 predicted glutathione
S-transferase protein [Sinorhizobium
341 342 YP 002828116.1 fredii NGR234]
-88-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase domain- LigF
containing protein [Caulobacter segnis
ATCC 21756] >gbIADG10504.11
Glutathione S-transferase domain
protein [Caulobacter segnis ATCC
343 344 YP 003593122.1 21756]
glutathione S-transferase [Pantoea LigF
vagans C9-1] >gbIAD009418.11
Glutathione S-transferase [Pantoea
345 346 YP 003930867.1 vagans C9-1]
Glutathione S-transferase domain LigF
protein [Glaciecola agarilytica 4H-3-
7+YE-5] >gbIAEE23328.11Glutathione S-
transferase domain protein [Glaciecola
347 348 YP 004434596.1 sp. 4H-3-7+YE-5]
glutathione S-transferase [Ramlibacter LigF
tataouinensis TTB310]
>gbIAEG94864.1Iglutathione S-
transferase-like protein [Ramlibacter
349 350 YP 004620883.1 tataouinensis TTB310]
glutathione S-transferase family protein LigF
[Aeromonas punctata]
>embICAG15111.11 glutathione S-
transferase family protein [Aeromonas
351 352 YP 067874.1 caviae]
glutathione S-transferase, putative LigF
[Ruegeria pomeroyi DSS-3]
>gbIAAV96533.1Iglutathione S-
transferase, putative [Ruegeria pomeroyi
353 354 YP 168502.1 DSS-3]
glutathione S-transferase LigF
[Pseudoalteromonas haloplanktis
TAC125] >embICA185615.11 putative
glutathione S-transferase
[Pseudoalteromonas haloplanktis
355 356 YP 339058.1 TAC125]
-89-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase-like [Ruegeria LigF
sp. TM1040] >gbIABF62942.11
glutathione S-transferase-like protein
357 358 YP_612204.1 [Ruegeria sp. TM1040]
glutathione S-transferase family protein LigF
[Sulfitobacter sp. EE-36]
>refIZP_00961889.11glutathione S-
transferase family protein [Sulfitobacter
sp. NAS-14.1] >gblEAP81303.11
glutathione S-transferase family protein
[Sulfitobacter sp. NAS-14.1]
>gb1EAP85807.11glutathione S-
transferase family protein [Sulfitobacter
359 360 ZP 00954574.1 sp. EE-36]
maleylacetoacetate isomerase LigF
[Oceanospirillum sp. MED92]
>gblEAR62715.11maleylacetoacetate
361 362 ZP_01165363.1 isomerase [Oceanospirillum sp. MED92]
glutathione S-transferase, putative LigF
[Roseovarius sp. TM1035]
>gbIEDM30676.11glutathione S-
transferase, putative [Roseovarius sp.
363 364 ZP 01881157.1 TM1035]
Glutathione S-transferase domain LigF
365 366 ZP 03523367.1 [Rhizobium etli GR56]
Glutathione S-transferase GST-6.0 LigF
[Yersinia ruckeri ATCC 29473]
>gblEEQ00521.11Glutathione S-
transferase GST-6.0 [Yersinia ruckeri
367 368 ZP 04614975.1 ATCC 29473]
glutathione S-transferase, N-terminal LigF
domain protein [Rhodobacteraceae
bacterium KLH11] >gblEEE36118.11
glutathione S-transferase, N-terminal
domain protein [Rhodobacteraceae
369 370 ZP 05125190.1 bacterium KLH11]
-90-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase [Silicibacter LigF
lacuscaerulensis ITI-1157]
>gblEEX09309.11glutathione S-
transferase [Silicibacter lacuscaerulensis
371 372 ZP 05786193.1 ITI-1157]
maleylacetoacetate isomerase LigF
[Asticcacaulis biprosthecum C19]
>gblEGF90974.11maleylacetoacetate
isomerase [Asticcacaulis biprosthecum
373 374 ZP 08264339.1 C19]
glutathione S-transferase LigF
[Bradyrhizobiaceae bacterium SG-6C]
>gb1EGP07427.11glutathione S-
transferase [Bradyrhizobiaceae
375 376 ZP 08630058.1 bacterium SG-6C]
glutathione S-transferase GST 16 LigF
377 378 AAG34806.1 [Glycine max]
tau class GST protein 3 [Oryza sativa LigF
Indica Group] >gb1EAY79295.11
hypothetical protein 0s1_34421 [Oryza
sativa Indica Group] >embICAZ68078.11
glutathione S-transferase [Oryza sativa
379 380 AAQ02687.1 Indica Group]
Glutathione S-transferase domain LigF
381 382 ADV56298.1 protein [Shewanella putrefaciens 200]
glutathione S-transferase [Medicago LigF
383 384 BAB70616.1 sativa]
predicted protein [Hordeum vulgare LigF
385 386 BAJ94610.1 subsp. vulgare]
hypothetical protein VITISV_002763 LigF
387 388 CAN68934.1 [Vitis vinifera]
putative glutathione S-transferase LigF
389 390 CBW26056.1 [Bacteriovorax marinus SJ]
glutathione S-transferase [Coccidioides LigF
391 392 EFW18159.1 posadasii str. Silveira]
-91-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
hypothetical protein LigF
BATDEDRAFT 85058
393 394 EGF84337.1 [Batrachochytrium dendrobatidis JAM81]
0s10g0528400 [Oryza sativa Japonica LigF
Group] >gbIAAG32472.11AF309379_1
putative glutathione S-transferase
OsGSTU3 [Oryza sativa Japonica
Group] >gbIAAM12325.11AC091680_26
putative glutathione S-transferase [Oryza
sativa Japonica Group]
>gbIAAM94544.1I putative glutathione S-
transferase [Oryza sativa Japonica
Group] >gbIAAP54745.1Iglutathione S-
transferase GSTU6, putative, expressed
[Oryza sativa Japonica Group]
>dbjIBAF27038.110s10g0528400
[Oryza sativa Japonica Group]
>gblEAZ16756.11 hypothetical protein
OsJ_32232 [Oryza sativa Japonica
395 396 NP 001065124.1 Group]
Glutathione S-transferase-like protein LigF
[Arabidopsis thaliana]
>embICAB83126.11Glutathione
transferase III-like protein [Arabidopsis
thaliana] >gbIAEE80388.1I Glutathione
S-transferase-like protein [Arabidopsis
397 398 NP 191835.1 thaliana]
glutathione S-transferase family protein LigF
[Shewanella oneidensis MR-1]
>gbIAAN54634.11AE015603_8
glutathione S-transferase family protein
399 400 NP 717190.1 [Shewanella oneidensis MR-1]
-92-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione 5-transferase LigF
[Bradyrhizobium japonicum USDA 1101
>dbjIBAC47768.1Iglutathione 5-
transferase [Bradyrhizobium japonicum
401 402 NP 769143.1 USDA 110]
glutathione transferase zeta 1 LigF
[Chromobacterium violaceum ATCC
12472] >gbIAAQ58646.11 probable
glutathione transferase zeta 1
[Chromobacterium violaceum ATCC
403 404 NP 900642.1 12472]
glutathione 5-transferase [Coccidioides LigF
405 406 XP 001246353.1 immitis RS]
PREDICTED: similar to glutathione S- LigF
407 408 XP 002171087.1 transferase [Hydra magnipapillata]
PREDICTED: hypothetical protein [Vitis LigF
vinifera] >embICB132223.31 unnamed
409 410 XP 002263386.1 protein product [Vitis vinifera]
PREDICTED: hypothetical protein [Vitis LigF
vinifera] >embICB132222.31 unnamed
411 412 XP 002263424.1 protein product [Vitis vinifera]
PREDICTED: hypothetical protein LigF
413 414 XP 002272099.1 isoform 2 [Vitis vinifera]
glutathione s-transferase, putative LigF
[Ricinus communis] >gbIEEF34551.11
glutathione s-transferase, putative
415 416 XP 002527848.1 [Ricinus communis]
Glutathione 5-transferase A, putative LigF
[Perkinsus marinus ATCC 50983]
>gblEER18137.11Glutathione 5-
transferase A, putative [Perkinsus
417 418 XP 002786341.1 marinus ATCC 50983]
-93-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
Glutathione S-transferase, putative LigF
[Coccidioides posadasii C735 delta
SOWgp] >gblEER24644.11Glutathione
5-transferase, putative [Coccidioides
419 420 XP 003066789.1 posadasii C735 delta SOWgp]
PREDICTED: similar to ganglioside- LigF
induced differentiation-associated-
protein 1 [Tribolium castaneum]
>gbIEFA00477.11 hypothetical protein
TcasGA2 TC003336 [Tribolium
421 422 XP 970577.1 castaneum]
glutathione S-transferase domain- LigF
containing protein [Sphingomonas
wittichii RW1] >gbIABQ69421.11
Glutathione S-transferase, N-terminal
423 424 YP 001263559.1 domain [Sphingomonas wittichii RW1]
glutathione S-transferase domain- LigF
containing protein [Shewanella pealeana
ATCC 700345] >gbIABV88497.11
Glutathione S-transferase domain
425 426 YP 001503032.1 [Shewanella pealeana ATCC 700345]
glutathione S-transferase ll LigF
[Acaryochloris marina MBIC11017]
>gbIABW27665.1Iglutathione S-
transferase ll [Acaryochloris marina
427 428 YP 001516981.1 MBIC11017]
glutathione S-transferase, [Sorangium LigF
cellulosum 'So ce 56']
>embICAN94912.1Iglutathione S-
transferase, putative [Sorangium
429 430 YP 001615392.1 cellulosum 'So ce 561
glutathione S-transferase domain- LigF
containing protein [Caulobacter sp. K31]
>gbIABZ73058.11Glutathione S-
transferase domain [Caulobacter sp.
431 432 YP 001685556.1 K31]
-94-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase domain- LigF
containing protein [Pseudomonas putida
W619] >gblACA71685.11Glutathione S-
transferase domain [Pseudomonas
433 434 YP 001748054.1 putida W619]
glutathione S-transferase [Cyanothece LigF
sp. ATCC 51142] >gblACB52305.11
glutathione S-transferase [Cyanothece
435 436 YP 001804371.1 sp. ATCC 51142]
glutathione s-transferase protein; gsta LigF
protein [Cupriavidus taiwanensis LMG
19424] >embICAQ71222.11 putative
glutathione S-transferase protein; gstA
protein [Cupriavidus taiwanensis LMG
437 438 YP 002007283.1 19424]
glutathione S-transferase LigF
[Phenylobacterium zucineum HLK1]
>gblACG78383.11glutathione S-
transferase [Phenylobacterium zucineum
439 440 YP 002130812.1 HLK1]
glutathione S-transferase domain LigF
[Acidithiobacillus ferrooxidans ATCC
53993] >reflYP_002426974.11
glutathione S-transferase
[Acidithiobacillus ferrooxidans ATCC
23270] >gb1ACH84426.11Glutathione S-
transferase domain [Acidithiobacillus
ferrooxidans ATCC 53993]
>gblACK78121.11glutathione S-
transferase [Acidithiobacillus
441 442 YP 002220633.1 ferrooxidans ATCC 23270]
glutathione S-transferase domain- LigF
containing protein [Cyanothece sp. PCC
7425] >gblACL44057.11Glutathione S-
transferase domain protein [Cyanothece
443 444 YP 002482418.1 sp. PCC 7425]
-95-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione 5-transferase protein LigF
[Agrobacterium radiobacter K84]
>gblACM25821.11glutathione 5-
transferase protein [Agrobacterium
445 446 YP 002543747.1 radiobacter K84]
glutathione 5-transferase domain protein LigF
[Rhizobium leguminosarum by. trifolii
WSM1325] >gblACS55200.11
Glutathione 5-transferase domain
protein [Rhizobium leguminosarum by.
447 448 YP 002974739.1 trifolii WSM1325]
glutathione transferase LigF
[Pseudoalteromonas sp. 5M9913]
>gbIADT70298.1I glutathione
transferase [Pseudoalteromonas sp.
449 450 YP 004065207.1 5M9913]
glutathione 5-transferase [Pseudomonas LigF
brassicacearum subsp. brassicacearum
NFM421] >gbIAEA72175.11 putative
glutathione 5-transferase [Pseudomonas
brassicacearum subsp. brassicacearum
451 452 YP 004357179.1 NFM421]
glutathione 5-transferase [Cupriavidus LigF
necator N-1] >gbIAE179688.11
glutathione 5-transferase [Cupriavidus
453 454 YP 004680920.1 necator N-1]
glutathione 5-transferase [Rhizobium etli LigF
CFN 42] >gbIABC90083.11 glutathione 5-
transferase protein [Rhizobium etli CFN
455 456 YP 468810.1 42]
glutathione 5-transferase [Burkholderia LigF
xenovorans LB400] >gbIABE34690.1I
Glutathione 5-transferase [Burkholderia
457 458 YP 554040.1 xenovorans LB400]
-96-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase-like [Ruegeria LigF
sp. TM1040] >gbIABF62841.11
glutathione S-transferase-like protein
459 460 YP 612103.1 [Ruegeria sp. TM1040]
glutathione S-transferase domain- LigF
containing protein [Shewanella sp. MR-4]
>gbIAB140253.11Glutathione S-
transferase, N-terminal domain protein
461 462 YP 735310.1 [Shewanella sp. MR-4]
glutathione S-transferase domain- LigF
containing protein [Nitrosomonas
eutropha C91] >gbIAB159602.11
Glutathione S-transferase, C-terminal
463 464 YP 747567.1 domain [Nitrosomonas eutropha C91]
maleylacetoacetate isomerase LigF
[Maricaulis mans MCS10]
>gbIAB166289.1I maleylacetoacetate
465 466 YP 757227.1 isomerase [Maricaulis marls MCS10]
glutathione S-transferase domain- LigF
containing protein [Shewanella sp. ANA-
3] >gbIABK46993.11Glutathione S-
transferase, N-terminal domain protein
467 468 YP 868399.1 [Shewanella sp. ANA-3]
glutathione S-transferase domain- LigF
containing protein [Shewanella sp. ANA-
3] >gbIABK49092.11Glutathione S-
transferase, N-terminal domain protein
469 470 YP 870498.1 [Shewanella sp. ANA-3]
glutathione S-transferase domain- LigF
containing protein [Marinobacter
aquaeolei VT8] >gbIABM17524.11
Glutathione S-transferase, N-terminal
471 472 YP 957711.1 domain [Marinobacter aquaeolei VT8]
glutathione S-transferase domain- LigF
containing protein [Marinobacter
aquaeolei VT8] >gbIABM17686.11
Glutathione S-transferase, N-terminal
473 474 YP 957873.1 domain [Marinobacter aquaeolei VT8]
-97-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase domain- LigF
containing protein [Marinobacter
aquaeolei VT8] >gbIABM20606.11
Glutathione S-transferase, N-terminal
475 476 YP 960793.1 domain [Marinobacter aquaeolei VT8]
glutathione S-transferase domain- LigF
containing protein [Shewanella sp. W3-
18-1] >gbIABM24864.1I Glutathione S-
transferase, N-terminal domain
477 478 YP 963418.1 [Shewanella sp. W3-18-1]
glutathione S-transferase family protein LigF
[Oceanicola batsensis HTCC2597]
>gb1EAQ02499.11glutathione S-
transferase family protein [Oceanicola
479 480 ZP 01000028.1 batsensis HTCC2597]
glutathione S-transferase [Stigmatella LigF
aurantiaca DW4/3-1]
>reflYP_003956548.1Iglutathione s-
transferase [Stigmatella aurantiaca
DW4/3-1] >gb1EAU70026.1Iglutathione
5-transferase [Stigmatella aurantiaca
DW4/3-1] >gbIAD074721.11Glutathione
5-transferase [Stigmatella aurantiaca
481 482 ZP 01459182.1 DW4/3-1]
Glutathione S-transferase domain LigF
[Burkholderia graminis C4D1M]
>gbIEDT08402.11Glutathione S-
transferase domain [Burkholderia
483 484 ZP 02886014.1 graminis C4D1M]
Glutathione S-transferase [Alteromonas LigF
485 486 ZP 04713937.1 macleodii ATCC 27126]
-98-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
Glutathione S-transferase, N-terminal LigF
domain protein [Rhodobacterales
bacterium HTCC2083] >gbIEDZ42709.11
Glutathione S-transferase, N-terminal
domain protein [Rhodobacteraceae
487 488 ZP 05075049.1 bacterium HTCC2083]
glutathione S-transferase protein LigF
[Roseobacter sp. GAI101]
>gb1EEB85730.11glutathione S-
transferase protein [Roseobacter sp.
489 490 ZP 05101428.1 GA1101]
glutathione S-transferase LigF
[Rhodobacteraceae bacterium KLH11]
>gb1EEE39034.11glutathione S-
transferase [Rhodobacteraceae
491 492 ZP 05124402.1 bacterium KLH11]
glutathione S-transferase [Vibrio sp. LigF
RC341] >gb1EEX64947.11glutathione 5-
493 494 ZP 05926645.1 transferase [Vibrio sp. RC341]
Glutathione S-transferase-like protein LigF
[Cylindrospermopsis raciborskii CS-505]
>gbIEFA69058.11Glutathione S-
transferase-like protein
495 496 ZP 06308936.1 [Cylindrospermopsis raciborskii CS-505]
Glutathione S-transferase domain LigF
protein [Burkholderia sp. Ch1-1]
>gbIEFG73275.11Glutathione S-
transferase domain protein [Burkholderia
497 498 ZP 06838829.1 sp. Ch1-1]
glutathione S-transferase III [Vibrio LigF
sinaloensis DSM 21326]
>gb1EGA68654.11glutathione S-
transferase III [Vibrio sinaloensis DSM
499 500 ZP 08104209.1 21326]
-99-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
Glutathione 5-transferase LigF
[Oxalobacteraceae bacterium
IMCC9480] >gblEGF30821.11
Glutathione 5-transferase
[Oxalobacteraceae bacterium
501 502 ZP 08275708.1 1MCC9480]
glutathione 5-transferase LigF
[Pseudoalteromonas haloplanktis
ANT/505] >gblEG173123.11 glutathione S
transferase [Pseudoalteromonas
503 504 ZP 08409706.1 haloplanktis ANT/505]
glutathione 5-transferase [Shewanella LigF
sp. HN-41] >gblEGM70872.11
glutathione 5-transferase [Shewanella
505 506 ZP 08565123.1 sp. HN-41]
507 508 CAA12269.1 ORF 3 [Sphingomonas sp. RW5] LigF
glutathione transferase [Triticum LigF
509 510 CAC94002.1 aestivum]
maleylacetoacetate isomerase / LigF
glutathione 5-transferase [Bdellovibrio
bacteriovorus HD1001
>embICAE77948.1Imaleylacetoacetate
isomerase / glutathione 5-transferase
511 512 NP 967294.1 [Bdellovibrio bacteriovorus HD100]
RecName: Full=Protein ligF LigF
>dbjIBAA02031.11beta-etherase
[Sphingomonas paucimobilis]
513 514 P30347.1 >prf111914145A beta etherase
hypothetical protein LigF
SELMODRAFT 142654 [Selaginella
moellendorffii] >gbIEFJ34604.11
hypothetical protein
SELMODRAFT 142654 [Selaginella
515 516 XP 002964271.1 moellendorffii]
-100-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase-like protein LigF
[Methylibium petroleiphilum PM1]
>gbIABM95079.11 glutathione S-
transferase-like protein [Methylibium
517 518 YP_001021314.1 petroleiphilum PM1]
glutathione S-transferase domain- LigF
containing protein [Burkholderia
phymatum 5TM815] >gblACC75341.11
Glutathione S-transferase domain
519 520 YP_001862387.1 [Burkholderia phymatum 5TM815]
glutathione S-transferase LigF
[Phenylobacterium zucineum HLK1]
>gblACG78321.11glutathione S-
transferase [Phenylobacterium zucineum
521 522 YP 002130750.1 HLK1]
glutathione S-transferase [Sinorhizobium LigF
fredii NGR234] >gblACP24502.11
glutathione S-transferase [Sinorhizobium
523 524 YP 002825255.1 fredii NGR234]
glutathione S-transferase domain- LigF
containing protein [Burkholderia sp.
CCGE1003] >gbIADN59379.11
Glutathione S-transferase domain
525 526 YP_003908670.1 protein [Burkholderia sp. CCGE1003]
glutathione s-transferase domain- LigF
containing protein [Variovorax paradoxus
EPS] >gbIADU36319.11Glutathione S-
transferase domain [Variovorax
527 528 YP 004154430.1 paradoxus EPS]
glutathione S-transferase domain- LigF
containing protein [Burkholderia sp.
CCGE1001] >gbIADX56921.11
Glutathione S-transferase domain
529 530 YP_004229981.1 protein [Burkholderia sp. CCGE1001]
-101-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase, N-terminal LigF
domain protein [Polymorphum gilvum
5L003B-26A1] >gbIADZ69468.11
Glutathione S-transferase, N-terminal
domain protein [Polymorphum gilvum
531 532 YP 004302768.1 5L003B-26A1]
glutathione S-transferase-like protein LigF
[Novosphingobium sp. PP1Y]
>embICCA92074.1Iglutathione S-
transferase-like [Novosphingobium sp.
533 534 YP 004533892.1 PP1Y]
glutathione S-transferase-like protein LigF
[Novosphingobium sp. PP1Y]
>embICCA92075.1Iglutathione S-
transferase-like [Novosphingobium sp.
535 536 YP 004533893.1 PP1Y]
glutathione S-transferase-like protein LigF
[Novosphingobium sp. PP1Y]
>embICCA92087.1Iglutathione S-
transferase-like [Novosphingobium sp.
537 538 YP 004533905.1 PP1Y]
glutathione S-transferase-like protein LigF
[Novosphingobium aromaticivorans DSM
12444] >gbIABD26530.1Iglutathione S-
transferase-like protein
[Novosphingobium aromaticivorans DSM
539 540 YP 497364.1 12444]
glutathione S-transferase-like protein LigF
[Novosphingobium aromaticivorans DSM
12444] >gbIABD27301.11glutathione S-
transferase-like protein
[Novosphingobium aromaticivorans DSM
541 542 YP 498135.1 12444]
glutathione S-transferase-like protein LigF
[Novosphingobium aromaticivorans DSM
12444] >gbIABD27308.1Iglutathione S-
transferase-like protein
[Novosphingobium aromaticivorans DSM
543 544 YP 498142.1 12444]
-102-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase-like protein LigF
[Novosphingobium aromaticivorans DSM
12444] >gbIABD27309.1Iglutathione S-
transferase-like protein
[Novosphingobium aromaticivorans DSM
545 546 YP 498143.1 12444]
maleylacetoacetate isomerase LigF
[Oceanicaulis alexandrii HTCC2633]
>gblEAP91525.11 maleylacetoacetate
isomerase [Oceanicaulis alexandrii
547 548 ZP 00952372.1 HTCC2633]
glutathione S-transferase, putative LigF
[Roseovarius nubinhibens ISM]
>gblEAP78164.11glutathione S-
transferase, putative [Roseovarius
549 550 ZP 00959702.1 nubinhibens ISM]
glutathione S-transferase, putative LigF
[Roseovarius sp. 217] >gb1EAQ27224.11
glutathione S-transferase, putative
551 552 ZP_01034543.1 [Roseovarius sp. 217]
glutathione S-transferase, putative LigF
[Roseobacter sp. MED193]
>gblEAQ44057.11 glutathione S-
transferase, putative [Roseobacter sp.
553 554 ZP 01057917.1 MED193]
glutathione S-transferase [marine LigF
gamma proteobacterium HTCC2207]
>gblEAS48069.11glutathione S-
transferase [marine gamma
555 556 ZP 01223510.1 proteobacterium HTCC2207]
glutathione S-transferase, putative LigF
[Roseobacter sp. 5K209-2-6]
>gbIEBA17470.11 glutathione S-
transferase, putative [Roseobacter sp.
557 558 ZP 01753989.1 5K209-2-6]
-103-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase-like protein LigF
[Phaeobacter gallaeciensis B5107]
>gbIEDQ11817.1Iglutathione S-
transferase-like protein [Phaeobacter
559 560 ZP 02146800.1 gallaeciensis B5107]
glutathione S-transferase, putative LigF
[Phaeobacter gallaeciensis 2.101
>gbIEDQ07480.11glutathione S-
transferase, putative [Phaeobacter
561 562 ZP 02150992.1 gallaeciensis 2.10]
glutathione S-transferase 2 LigF
[Rhodobacterales bacterium HTCC2083]
>gbIEDZ41252.1Iglutathione S-
transferase 2 [Rhodobacteraceae
563 564 ZP 05073592.1 bacterium HTCC2083]
glutathione S-transferase LigF
[Rhodobacterales bacterium Y4I]
>gbIEDZ45430.1Iglutathione S-
transferase [Rhodobacterales bacterium
565 566 ZP 05077451.1 Y41]
Glutathione S-transferase, N-terminal LigF
domain protein [Pseudovibrio sp. JE062]
>gblEEA92555.11Glutathione S-
transferase, N-terminal domain protein
567 568 ZP_05087035.1 [Pseudovibrio sp. JE062]
glutathione S-transferase [Ruegeria sp. LigF
R11] >gblEEB71116.11glutathione 5-
569 570 ZP_05089424.1 transferase [Ruegeria sp. R11]
protein LigF [gamma proteobacterium LigF
NOR5-3] >gblEED32863.11 protein LigF
571 572 ZP_05126316.1 [gamma proteobacterium NOR5-3]
maleylacetoacetate isomerase [gamma LigF
proteobacterium NOR5-3]
>gblEED33370.1Imaleylacetoacetate
isomerase [gamma proteobacterium
573 574 ZP 05126823.1 NOR5-3]
-104-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione S-transferase [Silicibacter sp. LigF
TrichCH4B] >gblEEW58747.11
glutathione S-transferase [Silicibacter sp.
575 576 ZP 05741946.1 TrichCH4B]
-105-

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
[00257] Table 18.
-106-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione 5-transferase homolog LigG
577 578 BAA77216.1 [Sphingomonas paucimobilis]
glutathione 5-transferase family protein LigG
[Novosphingobium sp. PP1Y]
>embICCA92089.1I glutathione 5-
579 580 YP 004533907.1 transferase family protein
glutathione 5-transferase family protein LigG
[Thiobacillus denitrificans ATCC 25259]
>gbIAAZ97003.1I glutathione 5-
581 582 YP 314808.1 transferase family protein [Thiobacillus
glutathione 5-transferase family protein LigG
[Ruegeria pomeroyi DSS-3]
>gbIAAV95330.1I glutathione 5-
583 584 YP 167289.1 transferase family protein [Ruegeria
glutathione 5-transferase family protein LigG
[Maritimibacter alkaliphilus HTCC2654]
>gblEAQ14262.11glutathione 5-
585 586 ZP 01011943.1 transferase family protein
glutathione 5-transferase protein LigG
[Agrobacterium radiobacter K84]
587 588 YP 002540613.1>gblACM29018.11glutathione 5-
Novel glutathione 5-transferase omega LigG
589 590 CAJ81793.1 protein [Xenopus (Silurana) tropicalis]
glutathione 5-transferase omega 2 LigG
[Xenopus (Silurana) tropicalis]
591 592 NP 001005086.1>gbIAAH77010.1I MGC89704 protein
PREDICTED: glutathione 5-transferase LigG
593 594 XP_624501.1 omega-1 [Apis mellifera]
GM24932 [Drosophila sechellia] LigG
595 596 XP 002029736.1>gbIEDW40722.1I GM24932 [Drosophila
hypothetical protein L0C436894 [Danio LigG
597 598 NP 001002621.1rerio] >gbIAAH75965.11Zgc:92254 [Danio
predicted protein [Pediculus humanus LigG
corporis] >gblEEB18748.11 predicted
599 600 XP_002431486.1protein [Pediculus humanus corporis]
glutathione 5-transferase [Glossina LigG
601 602 ADD18952.1 morsitans morsitans]
-107-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
GE21298 [Drosophila yakuba] LigG
603 604 XP 002093444.1>gbIEDW93156.11GE21298 [Drosophila
GK20540 [Drosophila willistoni] LigG
605 606 XP 002068563.1>gbIEDW79549.11GK20540 [Drosophila
607 608 NP 001165912.1glutathione S-transferase 01 [Nasonia LigG
putative glutathione S-transferase LigG
609 610 CAM34501.1 [Cotesia congregata]
PREDICTED: similar to glutathione-S- LigG
611 612 XP 421747.1 transferase homolog isoform 2 [Gallus
GA23449 [Drosophila pseudoobscura LigG
pseudoobscura] >gbIEDY73696.1I
613 614 XP 002135069.1GA23449 [Drosophila pseudoobscura
glutathione S-transferase omega-1 [Mus LigG
musculus]
>sp1009131.21GSTO1 MOUSE
RecName: Full=Glutathione S-transferase
omega-1; Short=GSTO-1; AltName:
Full=p28 >gbIAAB70110.11 glutathione-S-
transferase homolog [Mus musculus]
>dbjIBAC25667.11 unnamed protein
product [Mus musculus]
>gbIAAH85165.1I Glutathione S-
transferase omega 1 [Mus musculus]
>dbjIBAE27469.11 unnamed protein
615 616 NP 034492.1 product [Mus musculus]
. . .
glutathione S-transferase domain- LigG
617 618 ZP 03524422.1 containing protein [Rhizobium etli GR56]
CG6673, isoform A [Drosophila LigG
melanogaster] >gbIAAF50404.21CG6673,
isoform A [Drosophila melanogaster]
619 620 NP 729388.1 >gblACZ02426.11glutathione 5-
glutathione S-transferase [Xanthomonas LigG
vesicatoria ATCC 35937]
621 622 ZP 08179398.1 >gblEGD08414.11glutathione 5-
PREDICTED: glutathione S-transferase LigG
623 624 XP 003218563.1omega-1-like isoform 1 [Anolis
625 626 ABC86304.1 IP16242p [Drosophila melanogaster] LigG
-108-

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
GL15567 [Drosophila persimilis]
LigG
627 628 XP 002026470.1>gbIEDW33419.1I GL15567 [Drosophila
glutathione S-transferase omega 4
LigG
[Bombyx mori] >gbIABY66601.11
629 630 NP 001108461.1glutathione S-transferase 13 [Bombyx
glutathione S-transferase omega-1 [Sus LigG
scrofa] >refIXP 001929519.11
PREDICTED: glutathione S-transferase
omega-1-like [Sus scrofa]
>splQ9N1F5.21GST01_PIG RecName:
Full=Glutathione S-transferase omega-1;
Short=GSTO-1; AltName:
Full=Glutathione-dependent
631 632 NP 999215.1 dehydroascorbate reductase
hypothetical protein L0C492500 [Danio
LigG
rerio] >gbIAAH85467.11Zgc:101897
633 634 NP 001007373.1[Danio rerio] >gbIAA165433.11Zgc:101897
glutathione S-transferase domain-
LigG
containing protein [Delftia acidovorans
SPH-1] >gbIABX38269.1I Glutathione 5-
635 636 YP 001566654.1transferase domain [Delftia acidovorans
omega class glutathione S-transferase
LigG
637 638 ADY80021.1 [Oplegnathus fasciatus]
glutathione S-transferase domain-
LigG
containing protein [Sinorhizobium
medicae WSM419] >gbIABR62323.11
639 640 YP 001329158.1Glutathione S-transferase domain
hypothetical protein LOC431979
LigG
[Xenopus laevis] >gbIAAH70673.1I
641 642 NP_001084924.1MGC82327 protein [Xenopus laevis]
PREDICTED: glutathione S-transferase
LigG
643 644 XP 003396907.1omega-1-like [Bombus terrestris]
PREDICTED: glutathione S-transferase
LigG
645 646 XP 001368758.1omega-1-like isoform 1 [Monodelphis
GH16193 [Drosophila grimshawi]
LigG
647 648 XP 001983981.1>gbIEDV96329.1I GH16193 [Drosophila
649 650 ADK66966.1 glutathione s-transferase [Chironomus
LigG
-109-

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
PREDICTED: similar to glutathione-S-
LigG
651 652 XP 001232808.1transferase homolog isoform 1 [Gallus
GK20354 [Drosophila willistoni]
LigG
653 654 XP 002068565.1>gbIEDW79551.11GK20354 [Drosophila
hypothetical protein sce0602 [Sorangium LigG
cellulosum 'So ce 56'] >embICAN90759.11
655 656 YP 001611239.1gst2 [Sorangium cellulosum 'So ce 56']
PREDICTED: glutathione S-transferase
LigG
657 658 XP_001499427.'4. omega-1-like isoform 1 [Equus caballus]
putative glutathione S-transferase protein LigG
[Sinorhizobium meliloti 1021]
>reflYP_004550950.1Iglutathione S-
transferase domain-containing protein
[Sinorhizobium meliloti AK83]
>embICAC41740.11 Putative glutathione
S-transferase [Sinorhizobium meliloti
1021] >gbIAEG06303.11Glutathione S-
transferase domain protein
[Sinorhizobium meliloti BL225C]
659 660 NP 384409.1 >gbIAEG55336.11Glutathione 5-
661 662 CAG05035.1 unnamed protein product [Tetraodon
LigG
hypothetical protein PaerPA_01002475
LigG
[Pseudomonas aeruginosa PACS2]
>reflYP_002440902.11
maleylacetoacetate isomerase
[Pseudomonas aeruginosa LESB58]
>refIZP_04928412.11maleylacetoacetate
isomerase [Pseudomonas aeruginosa
C3719] >gblEAZ52531.11
maleylacetoacetate isomerase
[Pseudomonas aeruginosa C3719]
>embICAW28043.1 I maleylacetoacetate
663 664 ZP 01365353.1 isomerase [Pseudomonas aeruginosa
maleylacetoacetate isomerase
LigG
[Pseudomonas aeruginosa PA7]
>gbIABR84080.11maleylacetoacetate
665 666 YP 001348642.1 isomerase [Pseudomonas aeruginosa
-110-

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
maleylacetoacetate isomerase
LigG
[Pseudomonas aeruginosa 2192]
>gblEAZ57884.1Imaleylacetoacetate
667 668 ZP 04933765.1 isomerase [Pseudomonas aeruginosa
maleylacetoacetate isomerase
LigG
[Pseudomonas aeruginosa PA01]
>spIP57109.11MAAI_PSEAE RecName:
Full=Maleylacetoacetate isomerase;
Short=MAAI
669 670 NP 250697.1 >gbIAAG05395.11AE004627_3
hypothetical protein
LigG
671 672 EFN59352.1 CHLNCDRAFT 137800 [Chlorella
glutathione S-transferase domain-
LigG
containing protein [Variovorax paradoxus
5110] >gblACS20318.11Glutathione 5-
673 674 YP 002945584.1transferase domain protein [Variovorax
PREDICTED: glutathione S-transferase
LigG
675 676 XP 002197460.1omega 1 [Taeniopygia guttata]
GG15075 [Drosophila erecta]
LigG
677 678 XP 001971643.1>gbIEDV50669.1I GG15075 [Drosophila
glutathione S-transferase omega-1-like
LigG
[Acyrthosiphon pisum] >dbjIBAH71013.11
679 680 NP_001155757.1ACYP1008340 [Acyrthosiphon pisum]
GL15565 [Drosophila persimilis]
LigG
681 682 XP 002026468.1>gbIEDW33417.1I GL15565 [Drosophila
GA19760 [Drosophila pseudoobscura
LigG
pseudoobscura] >gblEAL29555.11
683 684 XP 001353820.1GA19760 [Drosophila pseudoobscura
maleylacetoacetate isomerase
LigG
[Pseudomonas aeruginosa UCBPP-PA14]
>gbIABJ11194.11maleylacetoacetate
685 686 YP 791232.1 isomerase [Pseudomonas aeruginosa
-111-

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
maleylacetoacetate isomerase
LigG
[Pseudomonas aeruginosa PAb1]
>refIZP_07797003.1Imaleylacetoacetate
isomerase [Pseudomonas aeruginosa
39016] >gbIEFQ42099.11
maleylacetoacetate isomerase
[Pseudomonas aeruginosa 39016]
687 688 ZP 06879058.1 >gblEGM14661.11 maleylacetoacetate
689 690 EFZ22366.1 hypothetical protein SINV_14968
LigG
Glutathione 5-transferase domain
LigG
691 692 ZP 03527925.1 [Rhizobium etli CIAT 894]
693 694 ABD77536.1 hypothetical protein [Ictalurus punctatus] LigG
PREDICTED: glutathione 5-transferase
LigG
695 696 XP_002756473.1omega-1-like [Callithrix jacchus]
predicted protein [Nematostella vectensis] LigG
>gbIED044933.11 predicted protein
697 698 XP 001636996.1[Nematostella vectensis]
glutathione 5-transferase [Rhizobium etli LigG
CFN 42] >gbIABC89104.1Iglutathione 5-
699 700 YP 467831.1 transferase protein [Rhizobium etli CFN
glutathione-S-transferase [Mesorhizobium LigG
loti MAFF303099] >dbjIBAB48791.11
701 702 NP 103005.1 glutathione-S-transferase [Mesorhizobium
703 704 ADY47623.1 Glutathione transferase omega-1 [Ascaris LigG
705 706 BAG36430.1 unnamed protein product [Homo sapiens] LigG
PREDICTED: glutathione-S-transferase
LigG
707 708 XP 002718774.1omega 1 [Oryctolagus cuniculus]
Chain A, Crystal Structure Of Human
LigG
Glutathione Transferase Omega 1, Delta
155 >pdbl3LFLIB Chain B, Crystal
Structure Of Human Glutathione
Transferase Omega 1, Delta 155
709 710 3LFL_A >pdbl3LFLIC Chain C, Crystal Structure
PREDICTED: glutathione 5-transferase
LigG
omega-1-like [Macaca mulatta]
711 712 XP 002805857.1>gbIAB021635.1I glutathione 5-
-112-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
glutathione 5-transferase omega-1 LigG
[Rattus norvegicus] >gbIAAH79363.11
Glutathione 5-transferase omega 1
[Rattus norvegicus] >gbIEDL94393.1I
713 714 NP 00i007603.1 glutathione 5-transferase omega 1,
PREDICTED: similar to glutathione-S- LigG
715 716 XP 535007.1 transferase omega 1 isoform 1 [Canis
glutathione 5-transferase omega-1 LigG
isoform 1 [Homo sapiens]
>spIP78417.2IGST01_HUMAN
RecName: Full=Glutathione 5-transferase
omega-1; Short=GSTO-1 >pdbI1EEMIA
Chain A, Glutathione Transferase From
Homo Sapiens
>gbIAAF73376.11AF212303_1 glutathione
transferase omega [Homo sapiens]
>gbIAAB70109.1Iglutathione-S-
transferase homolog [Homo sapiens]
>gbIAAH00127.1I Glutathione 5-
transferase omega 1 [Homo sapiens]
>gbIAAV68046.1I glutathione 5-
717 718 NP 004823.1 transferase omega 1-1 [Homo sapiens]
_ . IANAIAI,ArlA Al I 1 il = f,
PREDICTED: glutathione 5-transferase LigG
719 720 XP 002758417.1omega-1-like [Callithrix jacchus]
PREDICTED: glutathione 5-transferase LigG
721 722 XP 003218564.1omega-1-like isoform 2 [Anolis
Glutathione transferase omega-1 LigG
723 724 EFN62827.1 [Camponotus floridanus]
PREDICTED: glutathione 5-transferase LigG
725 726 XP 508020.3 omega-1 isoform 3 [Pan troglodytes]
727 728 CAD97673.1 hypothetical protein [Homo sapiens] LigG
glutathione 5-transferase omega 1 LigG
729 730 BAJ20927.1 [synthetic construct]
731 732 ACR43779.1 glutathione 5-transferase [Chironomus LigG
-113-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
RecName: Full=Glutathione S-transferase LigG
omega-1; Short=GSTO-1; AltName:
Full=Glutathione-dependent
dehydroascorbate reductase
733 734 Q9Z339.2 >gblAC132122.11glutathione 5-
GF10159 [Drosophila ananassae] LigG
735 736 XP 001956909.1>gbIEDV39715.1I GF10159 [Drosophila
hypothetical protein [Monosiga brevicollis LigG
MX1] >gbIEDQ92516.1I predicted protein
737 738 XP 001742278.1[Monosiga brevicollis MX1]
PREDICTED: glutathione S-transferase LigG
739 740 XP 002821176.1omega-1-like [Pongo abelii]
PREDICTED: glutathione S-transferase LigG
741 742 XP 003255483.1omega-1-like isoform 1 [Nomascus
glutathione S-transferase-like protein LigG
[Anabaena variabilis ATCC 29413]
>gbIABA24595.1I Glutathione 5-
743 744 YP 325490.1 transferase-like protein [Anabaena
PREDICTED: glutathione S-transferase LigG
745 746 XP 003208190.1omega-1-like [Meleagris gallopavo]
GK20539 [Drosophila willistoni] LigG
747 748 XP 002068562.1>gbIEDW79548.11GK20539 [Drosophila
GF10161 [Drosophila ananassae] LigG
749 750 XP 001956911.1>gbIEDV39717.11GF10161 [Drosophila
gluthathione S-transferase omega LigG
751 752 ABV24048.1 [Takifugu obscurus]
putative glutathione S-transferase protein LigG
[Pseudovibrio sp. JE062]
>gb1EEA93528.11 putative glutathione 5-
753 754 ZP 05086262.1 transferase protein [Pseudovibrio sp.
755 756 AAI28951.1 LOC100037104 protein [Xenopus laevis] LigG
GF10160 [Drosophila ananassae] LigG
757 758 XP 001956910.1>gbIEDV39716.1I GF10160 [Drosophila
glutathione S-transferase omega 2 LigG
[Xenopus laevis] >gbIAA153758.1I
759 760 NP 001099052.1LOC100037104 protein [Xenopus laevis]
Glutathione S-transferase domain LigG
761 762 ZP 03503214.1 [Rhizobium etli Kim 5]
-114-

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION
NO: NO: NO:
GJ12198 [Drosophila virilis]
LigG
763 764 XP 002046961.1>gbIEDW69303.1I GJ12198 [Drosophila
GF24331 [Drosophila ananassae]
LigG
765 766 XP 001956912.1>gbIEDV39718.1I GF24331 [Drosophila
PREDICTED: glutathione S-transferase
LigG
767 768 XP 001368790.1omega-1-like isoform 1 [Monodelphis
Glutathione S-transferase-like protein
LigG
[Cylindrospermopsis raciborskii CS-505]
>gbIEFA69058.1I Glutathione 5-
769 770 ZP 06308936.1 transferase-like protein
glutathione S-transferase omega 1
LigG
[Bombyx mandarina] >dbjIBAF91356.11
771 772 ABJ15788.1 omega-class glutathione S-transferase
glutathione S-transferase omega 2
LigG
[Bombyx mori] >gbIABC79689.1I
773 774 NP_001037406.1glutathione S-transferase 6 [Bombyx mori]
glutathione S-transferase omega 1
LigG
[Bombyx mori] >gbIABD36128.1I
775 776 NP 001040131.1glutathione S-transferase omega 1
-115-

CA 02811403 2013-03-14
WO 2012/036884
PCT/US2011/049619
[00258] Table 19.
-116-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
RecName: Full=C alpha-dehydrogenase
LigD
>dbj1BAA02030.11C alpha-dehydrogenase
[Sphingomonas paucimobilis]
>dbj1BAA01953.11C alpha-dehydrogenase
[Sphingomonas paucimobilis]
>gbIAAC60455.11C alpha-dehydrogenase
777 778 Q01198.1 [Sphingomonas paucimobilis]
short-chain dehydrogenase/reductase SDR LigD
[Novosphingobium aromaticivorans DSM
12444] >gbIABD24653.11 short-chain
dehydrogenase/reductase SDR
[Novosphingobium aromaticivorans DSM
779 780 YP 495487.1 12444]
short-chain dehydrogenase/reductase SDR LigD
[Novosphingobium sp. PP1Y]
>emb1CCA92080.11short-chain
dehydrogenase/reductase SDR
781 782 YP 004533898.1 [Novosphingobium sp. PP1Y]
Calpha-dehydrogenase [Sphingobium sp.
LigD
783 784 BAH56687.1 SYK-6]
short-chain dehydrogenase/reductase SDR LigD
[Novosphingobium sp. PP1Y]
>emb1CCA92103.11short-chain
dehydrogenase/reductase SDR
785 786 YP 004533921.1 [Novosphingobium sp. PP1Y]
short-chain dehydrogenase/reductase SDR LigD
[Novosphingobium aromaticivorans DSM
12444] >gbIABD25238.11 short-chain
dehydrogenase/reductase SDR
[Novosphingobium aromaticivorans DSM
787 788 YP 496072.1 12444]
Chain A, Structure Of Putative Short-Chain LigD
Dehydrogenase (Saro_0793) From
Novosphingobium Aromaticivorans
>pdb1310Y1B Chain B, Structure Of Putative
Short-Chain Dehydrogenase (Saro_0793)
789 790 310Y _A From Novosphingobium Aromaticivorans
-117-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short-chain dehydrogenase/reductase SDR LigD
[Novosphingobium aromaticivorans DSM
12444] >gbIABD25239.11 short-chain
dehydrogenase/reductase SDR
[Novosphingobium aromaticivorans DSM
791 792 YP 496073.1 12444]
Calpha-dehydrogenase [Sphingobium sp.
LigD
793 794 BAH56683.1 SYK-6]
short-chain dehydrogenase/reductase SDR LigD
[Novosphingobium sp. PP1Y]
>embICCA92102.1I short-chain
dehydrogenase/reductase SDR
795 796 YP_004533920.1 [Novosphingobium sp. PP1Y]
short-chain dehydrogenase/reductase SDR LigD
[Caulobacter segnis ATCC 21756]
>gbIADG10214.11 short-chain
dehydrogenase/reductase SDR [Caulobacter
797 798 YP 003592832.1 segnis ATCC 21756]
short-chain dehydrogenase/reductase SDR LigD
[Novosphingobium aromaticivorans DSM
12444] >gbIABD25150.11 short-chain
dehydrogenase/reductase SDR
[Novosphingobium aromaticivorans DSM
799 800 YP 495984.1 12444]
short-chain dehydrogenase/reductase SDR LigD
[Novosphingobium aromaticivorans DSM
12444] >gbIABD26315.1I short-chain
dehydrogenase/reductase SDR
[Novosphingobium aromaticivorans DSM
801 802 YP 497149.1 12444]
short-chain dehydrogenase/reductase SDR LigD
[Caulobacter segnis ATCC 21756]
>gbIADG10212.11 short-chain
dehydrogenase/reductase SDR [Caulobacter
803 804 YP 003592830.1 segnis ATCC 21756]
short-chain dehydrogenase/reductase SDR LigD
[Sphingomonas wittichii RW1]
>gbIABQ66748.1I short-chain
dehydrogenase/reductase SDR
805 806 YP 001260886.1 [Sphingomonas wittichii RW1]
-118-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short-chain dehydrogenase/reductase SDR LigD
[Parvibaculum lavamentivorans DS-1]
>gbIABS64322.11 short-chain
dehydrogenase/reductase SDR
807 808 YP 001413979.1 [Parvibaculum lavamentivorans DS-1]
short-chain dehydrogenase/reductase SDR LigD
[Parvibaculum lavamentivorans DS-1]
>gbIABS62643.11 short-chain
dehydrogenase/reductase SDR
809 810 YP 001412300.1 [Parvibaculum lavamentivorans DS-1]
short-chain dehydrogenase/reductase SDR LigD
[Parvibaculum lavamentivorans DS-1]
>gbIABS62642.11 short-chain
dehydrogenase/reductase SDR
811 812 YP 001412299.1 [Parvibaculum lavamentivorans DS-1]
Calpha-dehydrogenase [Sphingobium sp.
LigD
813 814 BAH56685.1 SYK-6]
short chain dehydrogenase [Mycobacterium LigD
avium subsp. paratuberculosis K-10]
>reflYP_880159.11 short chain
dehydrogenase [Mycobacterium avium 104]
>refIZP_05215302.11 short chain
dehydrogenase [Mycobacterium avium
subsp. avium ATCC 25291]
>gbIAAS03027.1I hypothetical protein
MAP 0710c [Mycobacterium avium subsp.
paratuberculosis K-10] >gbIABK67661.11
short chain dehydrogenase [Mycobacterium
avium 104] >gblEG040035.11 short-chain
alcohol dehydrogenase [Mycobacterium
815 816 NP 959644.1 avium subsp. paratuberculosis S397]
short chain dehydrogenase [Mycobacterium LigD
colombiense CECT 3035] >gb1EGT85268:11
short chain dehydrogenase [Mycobacterium
817 818 ZP 08717023.1 colombiense CECT 3035]
-119-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
oxidoreductase, short chain
LigD
dehydrogenase/reductase family protein
[gamma proteobacterium NOR5-3]
>gblEED33994.11oxidoreductase, short
chain dehydrogenase/reductase family
819 820 ZP_05127447.1 protein [gamma proteobacterium NOR5-3]
Estradiol 17-beta-dehydrogenase
LigD
[Sphingobium chlorophenolicum L-1]
>gbIAEG50913.11Estradiol 17-beta-
dehydrogenase [Sphingobium
821 822 YP 004555419.1 chlorophenolicum L-1]
short-chain dehydrogenase/reductase SDR LigD
[Burkholderia sp. CCGE1001]
>gbIADX57778.11 short-chain
dehydrogenase/reductase SDR
823 824 YP 004230838.1 [Burkholderia sp. CCGE1001]
putative oxidoreductase [Acidiphilium
LigD
multivorum AlU301] >dbj1BAJ81707.11
putative oxidoreductase [Acidiphilium
825 826 YP 004284589.1 multivorum AlU301]
hypothetical protein Acry_2115 [Acidiphilium LigD
cryptum JF-5] >gbIABQ31314.11 short-chain
dehydrogenase/reductase SDR [Acidiphilium
827 828 YP 001235233.1 cryptum JF-5]
hypothetical protein GP2143_09415 [marine LigD
gamma proteobacterium HTCC2143]
>gblEAW30413.11 hypothetical protein
GP2143 09415 [marine gamma
829 830 ZP 01617820.1 proteobacterium HTCC2143]
short-chain dehydrogenase/reductase
LigD
[Bradyrhizobiaceae bacterium SG-6C]
>gblEGP07476.11 short-chain
dehydrogenase/reductase
831 832 ZP 08629833.1 [Bradyrhizobiaceae bacterium SG-6C]
-120-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short-chain type dehydrogenase/reductase LigD
[Mycobacterium marinum M]
>gblACC43159.11short-chain type
dehydrogenase/reductase [Mycobacterium
833 834 YP 001853014.1 marinum M]
short-chain dehydrogenase/reductase SDR LigD
[Collimonas fungivorans Ter331]
>gbIAEK63634.11 short-chain
dehydrogenase/reductase SDR [Collimonas
835 836 YP 004754457.1 fungivorans Ter331]
short-chain dehydrogenase/reductase SDR LigD
[gamma proteobacterium NOR5-3]
>gblEED30944.11 short-chain
dehydrogenase/reductase SDR [gamma
837 838 ZP 05129129.1 proteobacterium NOR5-3]
short chain dehydrogenase [Mycobacterium LigD
839 840 ZP 05223648.1 intracellulare ATCC 13950]
short-chain dehydrogenase/reductase SDR LigD
[Sphingobium chlorophenolicum L-1]
>gbIAEG50877.1I short-chain
dehydrogenase/reductase SDR
841 842 YP 004555383.1 [Sphingobium chlorophenolicum L-1]
-121-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short chain dehydrogenase [Mycobacterium LigD
bovis BCG str. Pasteur 1173P2]
>reflYP_002643932.11 short-chain
dehydrogenase [Mycobacterium bovis BCG
str. Tokyo 172] >refIZP_06432004.11 short-
chain type dehydrogenase/reductase
[Mycobacterium tuberculosis T46]
>refIZP_06449040.11 short-chain type
dehydrogenase/reductase [Mycobacterium
tuberculosis T17] >refIZP_06453700.11 short=
chain type dehydrogenase/reductase
[Mycobacterium tuberculosis K85]
>refIZP_06508748.1I short-chain type
dehydrogenase/reductase [Mycobacterium
tuberculosis T92] >refIZP_06512283.11 short
chain dehydrogenase [Mycobacterium
tuberculosis EA5054] >reflYP_004722558.11
short-chain type dehydrogenase/reductase
[Mycobacterium africanum GM041182]
>embICAL70889.11 Putative short-chain type
dehydrogenase/reductase [Mycobacterium
bovis BCG str. Pasteur 1173P2]
>dbjIBAH25164.11 short-chain
dehydrogenase [Mycobacterium bovis BCG
str. Tokyo 172] >gblEFD12419.11 short-
chain type dehydrogenase/reductase
[Mycobacterium tuberculosis T46]
>gblEFD42482.11 short-chain type
dehydrogenase/reductase [Mycobacterium
843 844 YP 976997.1 tuberculosis K85] >gblEFD46215.11 short-
Short-chain dehydrogenase/reductase SDR LigD
[Congregibacter litoralis kT71]
>gblEAQ98875.11 Short-chain
dehydrogenase/reductase SDR
845 846 ZP 01101659.1 [Congregibacter litoralis KT71]
short chain dehydrogenase [marine gamma LigD
proteobacterium HTCC2143]
>gb1EAW32447.11 short chain
dehydrogenase [marine gamma
847 848 ZP 01615364.1 proteobacterium HTCC2143]
-122-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short-chain type dehydrogenase/reductase LigD
[Mycobacterium tuberculosis CPHL_A]
>gbIEFD16575.11 short-chain type
dehydrogenase/reductase [Mycobacterium
849 850 ZP 06436160.1 tuberculosis CPHL A]
short chain dehydrogenase [Mycobacterium LigD
bovis AF2122/97] >embICAD93736.11
PUTATIVE SHORT-CHAIN TYPE
DEHYDROGENASE/REDUCTASE
851 852 NP 854532.1 [Mycobacterium bovis
AF2122/97]
putative short-chain type
LigD
dehydrogenase/reductase [Mycobacterium
canettii CIPT 140010059]
>embICCC43191.11 putative short-chain
type dehydrogenase/reductase
853 854 YP 004744317.1 [Mycobacterium canettii CIPT 140010059]
short-chain dehydrogenase/reductase sdr
LigD
[Paenibacillus polymyxa 5C2]
>gbIAD057345.1I Short-chain
dehydrogenase/reductase SDR
855 856 YP 003947586.1 [Paenibacillus polymyxa 5C2]
short-chain dehydrogenase/reductase
LigD
[Stigmatella aurantiaca DW4/3-1]
>gbIAD069364.1I Short-chain
dehydrogenase/reductase SDR [Stigmatella
857 858 YP 003951191.1 aurantiaca DW4/3-1]
hypothetical protein Rmet_1846
LigD
[Cupriavidus metallidurans CH34]
>gbIABF08725.1I conserved hypothetical
859 860 YP 583994.1 protein [Cupriavidus
metallidurans CH34]
-123-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short chain dehydrogenase [Mycobacterium LigD
tuberculosis H37Rv] >reflYP_001282151.11
short chain dehydrogenase [Mycobacterium
tuberculosis H37Ra] >reflYP_001286813.11
short chain dehydrogenase [Mycobacterium
tuberculosis F11] >refIZP_02549252.11 short
chain dehydrogenase [Mycobacterium
tuberculosis H37Ra] >reflYP_003033128.11
short-chain type dehydrogenase/reductase
[Mycobacterium tuberculosis KZN 1435]
>refIZP_04924487.11 hypothetical protein
TBCG_00842 [Mycobacterium tuberculosis
C] >refIZP_04979832.11 hypothetical short-
chain type dehydrogenase/reductase
[Mycobacterium tuberculosis str. Haarlem]
>refIZP_05140274.11 short chain
dehydrogenase [Mycobacterium tuberculosis
'98-R604 INH-RIF-EM']
>refIZP_06444578.11 short-chain type
dehydrogenase/reductase [Mycobacterium
tuberculosis KZN 605] >refIZP_06503955.11
short chain dehydrogenase [Mycobacterium
tuberculosis 02_1987] >refIZP_06516315.11
short chain dehydrogenase [Mycobacterium
tuberculosis T85] >refIZP_06520361.11 short=
chain type dehydrogenase/reductase
[Mycobacterium tuberculosis GM 1503]
>refIZP_06802023.11 short chain
dehydrogenase [Mycobacterium tuberculosis
861 862 NP 215366.1 210] >refIZP 06951148.11 short
chain
short chain dehydrogenase [Mycobacterium LigD
ulcerans Agy99] >gbIABL03054.11short-
chain type dehydrogenase/reductase
863 864 YP 904525.1 [Mycobacterium ulcerans
Agy99]
-124-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK
DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short-chain dehydrogenase/reductase family LigD
oxidoreductase [Mycobacterium
parascrofulaceum ATCC BAA-614]
>gbIEFG75472.1I short-chain
dehydrogenase/reductase family
oxidoreductase [Mycobacterium
865 866 ZP 06851131.1 parascrofulaceum ATCC BAA-614]
3-oxoacyl-[acyl-carrier-protein] reductase (3- LigD
ketoacyl-acyl carrier protein reductase)
[Paenibacillus polymyxa E681]
>gbIADM70831.113-oxoacyl-[acyl-carrier-
protein] reductase (3-ketoacyl-acyl carrier
protein reductase) [Paenibacillus polymyxa
867 868 YP 003871369.1 E681]
oxidoreductase, short chain
LigD
dehydrogenase/reductase family [marine
gamma proteobacterium HTCC2148]
>gb1EEB78920.11oxidoreductase, short
chain dehydrogenase/reductase family
[marine gamma proteobacterium
869 870 ZP 05094873.1 HTCC2148]
probable oxidoreductase dehydrogenase
LigD
signal peptide protein [marine gamma
proteobacterium HTCC2207]
>gblEAS47242.11 probable oxidoreductase
dehydrogenase signal peptide protein
[marine gamma proteobacterium
871 872 ZP 01224235.1 HTCC2207]
short chain dehydrogenase [Myxococcus
LigD
xanthus DK 1622] >gbIABF86178.11
oxidoreductase, short chain
dehydrogenase/reductase family
873 874 YP 634033.1 [Myxococcus xanthus DK 1622]
short-chain dehydrogenase/reductase
LigD
875 876 ABL97174.1 [uncultured marine bacterium
EBO 49D07]
-125-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short chain dehydrogenase [Mycobacterium LigD
tuberculosis CDC1551]
>refIZP_07413312.21 short-chain type
dehydrogenase/reductase [Mycobacterium
tuberculosis SUMu001]
>refIZP_07668817.1I short-chain type
dehydrogenase/reductase [Mycobacterium
tuberculosis SUMu010]
>refIZP_07669069.1I short-chain type
dehydrogenase/reductase [Mycobacterium
tuberculosis SUMu011] >gbIAAK45115.11
oxidoreductase, short-chain
dehydrogenase/reductase family
[Mycobacterium tuberculosis CDC1551]
>gbIEF075870.11 short-chain type
dehydrogenase/reductase [Mycobacterium
tuberculosis SUMu001] >gbIEFP48221.11
short-chain type dehydrogenase/reductase
[Mycobacterium tuberculosis SUMu010]
>gbIEFP52129.11 short-chain type
dehydrogenase/reductase [Mycobacterium
877 878 NP 335301.1 tuberculosis SUMu011]
short-chain dehydrogenase/reductase SDR LigD
[marine gamma proteobacterium
HTCC2080] >gblEAW39988.11 short-chain
dehydrogenase/reductase SDR [marine
879 880 ZP 01627272.1 gamma proteobacterium HTCC2080]
short chain dehydrogenase [Brevibacillus
LigD
brevis NBRC 100599] >dbjIBAH46143.11
probable short chain dehydrogenase
881 882 YP 002774647.1 [Brevibacillus brevis NBRC 100599]
short-chain dehydrogenase/reductase SDR LigD
[Novosphingobium sp. PP1Y]
>embICCA92091.11 short-chain
dehydrogenase/reductase SDR
883 884 YP 004533909.1 [Novosphingobium sp. PP1Y]
-126-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short chain dehydrogenase [Mycobacterium LigD
885 886 ZP 04751842.1 kansasii ATCC 12478]
short-chain dehydrogenase/reductase SDR LigD
[gamma proteobacterium IMCC3088]
>gblEGG29327.11 short-chain
dehydrogenase/reductase SDR [gamma
887 888 ZP 08271356.1 proteobacterium IMCC3088]
short chain dehydrogenase [Myxococcus
LigD
fulvus HW-1] >gbIAE165260.1I short chain
889 890 YP 004666338.1 dehydrogenase [Myxococcus fulvus HW-1]
putative short chain
LigD
dehydrogenase/reductase [Mycobacterium
abscessus ATCC 19977]
>embICAM63993.11 Putative short chain
dehydrogenase/reductase [Mycobacterium
891 892 YP 001704647.1 abscessus]
cis-2,3-dihydrobipheny1-2,3-diol
LigD
dehydrogenase [Streptomyces sp. AA4]
>gbIEFL12318.11cis-2,3-dihydrobipheny1-
2,3-diol dehydrogenase [Streptomyces sp.
893 894 ZP 07283949.1 AA4]
hypothetical protein RALTA_A1476
LigD
[Cupriavidus taiwanensis LMG 19424]
>embICAQ69425.11 putative
OXIDOREDUCTASE DEHYDROGENASE
895 896 YP 002005492.1 [Cupriavidus taiwanensis LMG 19424]
SDR-family protein [Sphingobium japonicum LigD
UT265] >dbjIBA195093.11SDR-family
897 898 YP 003543705.1 protein [Sphingobium japonicum UT265]
short chain dehydrogenase/reductase family LigD
oxidoreductase [Hyphomonas neptunium
ATCC 15444] >gbIAB175402.11
oxidoreductase, short chain
dehydrogenase/reductase family
899 900 YP 759628.1 [Hyphomonas neptunium ATCC 15444]
-127-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK
DESCRIPTION: TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short-chain dehydrogenase/reductase SDR LigD
[Comamonas testosteroni KF-1]
>gblEED68191.11 short-chain
dehydrogenase/reductase SDR
901 902 ZP 03543905.1 [Comamonas testosteroni KF-1]
hypothetical protein SCAB_14801
LigD
[Streptomyces scabiei 87.22]
>embICBG68626.11 putative PROBABLE
SHORT-CHAIN TYPE
DEHYDROGENASE/REDUCTASE
903 904 YP 003487191.1 [Streptomyces scabiei 87.22]
3-oxoacyl-[acyl-carrier-protein] red uctase
LigD
905 906 AEG69105.1 [Ralstonia solanacearum P082]
short-chain dehydrogenase/reductase SDR LigD
[Clostridium cellulovorans 743B]
>refIZP_07630916.11 short-chain
dehydrogenase/reductase SDR [Clostridium
cellulovorans 743B] >gbIADL50229.11 short-
chain dehydrogenase/reductase SDR
907 908 YP 003841993.1 [Clostridium cellulovorans 743B]
hypothetical protein Rpic_1437 [Ralstonia
LigD
pickettii 12J] >gbIACD26578.11 short-chain
dehydrogenase/reductase SDR [Ralstonia
909 910 YP 001899010.1 pickettii 12J]
short chain dehydrogenase [Segniliparus
LigD
rugosus ATCC BAA-974] >gbIEFV13275.11
short chain dehydrogenase [Segniliparus
911 912 ZP 07965490.1 rugosus ATCC BAA-974]
-128-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
short-chain dehydrogenase [Pseudomonas LigD
aeruginosa PA01] >refIZP_01364886.11
hypothetical protein PaerPA_01001998
[Pseudomonas aeruginosa PACS2]
>reflYP_002441374.11 putative short-chain
dehydrogenase [Pseudomonas aeruginosa
LESB58] >refIZP_04933207.1I hypothetical
protein PA2G 00514 [Pseudomonas
aeruginosa 2192]
>gbIAAG04926.11AE004582_4 probable
short-chain dehydrogenase [Pseudomonas
aeruginosa PA01] >gb1EAZ57326.11
hypothetical protein PA2G_00514
[Pseudomonas aeruginosa 2192]
>embICAW28518.11 probable short-chain
dehydrogenase [Pseudomonas aeruginosa
LESB58] >gblEGM16253.11 putative short-
chain dehydrogenase [Pseudomonas
913 914 NP 250228.1 aeruginosa 138244]
hypothetical protein Mpe_A1784
LigD
[Methylibium petroleiphilum PM1]
>gbIABM94743.11 putative oxidoreductase
dehydrogenase signal peptide protein
915 916 YP 001020978.1 [Methylibium petroleiphilum PM1]
oxidoreductase dehydrogenase [Ralstonia
LigD
solanacearum CFBP2957]
>embICBJ43067.11 putative oxidoreductase
dehydrogenase [Ralstonia solanacearum
917 918 YP 003745682.1 CFBP2957]
919 920 ADD82954.1
BatM [Pseudomonas fluorescens] LigD
short-chain dehydrogenase/reductase family LigD
oxidoreductase [Mycobacterium
parascrofulaceum ATCC BAA-614]
>gbIEFG80090.1I short-chain
dehydrogenase/reductase family
oxidoreductase [Mycobacterium
921 922 ZP 06846575.1 parascrofulaceum ATCC BAA-614]
-129-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
oxidoreductase, short chain
LigD
dehydrogenase/reductase family
[Alcanivorax sp. DG881] >gbIEDX89108.11
oxidoreductase, short chain
dehydrogenase/reductase family
923 924 ZP 05041687.1 [Alcanivorax sp. DG881]
hypothetical protein H16_A1536 [Ralstonia LigD
eutropha H16] >embICAJ92668.11
conserved hypothetical protein [Ralstonia
925 926 YP 726036.1 eutropha H16]
Hypothetical Protein IMCC9480_775
LigD
[Oxalobacteraceae bacterium IMCC9480]
>gblEGF30787.11 Hypothetical Protein
IMCC9480 775 [Oxalobacteraceae
927 928 ZP 08275744.1 bacterium IMCC9480]
putative short-chain dehydrogenase
LigD
[Pseudomonas aeruginosa UCBPP-PA14]
>refIZP_06879570.11 putative short-chain
dehydrogenase [Pseudomonas aeruginosa
PAb1] >refIZP_07792770.11 putative short-
chain dehydrogenase [Pseudomonas
aeruginosa 39016] >gbIABJ10717.11
putative short-chain dehydrogenase
[Pseudomonas aeruginosa UCBPP-PA14]
>gbIEFQ37866.11 putative short-chain
dehydrogenase [Pseudomonas aeruginosa
39016] >gblEGM15719.11 putative short-
chain dehydrogenase [Pseudomonas
929 930 YP 791716.1 aeruginosa 152504]
oxidoreductase dehydrogenase protein
LigD
931 932 CAQ35702.1 [Ralstonia solanacearum M0lK2]
short chain dehydrogenase [Segniliparus
LigD
rugosus ATCC BAA-974] >gbIEFV12481.11
short chain dehydrogenase [Segniliparus
933 934 ZP 07966320.1 rugosus ATCC BAA-974]
-130-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
hypothetical protein Rpic12D_1478
LigD
[Ralstonia pickettii 12D] >gblACS62765:11
short-chain dehydrogenase/reductase SDR
935 936 YP 002981437.1 [Ralstonia pickettii 12D]
C alpha-dehydrogenase LigD [Cupriavidus LigD
necator N-1] >gbIAE176910.11C alpha-
dehydrogenase LigD [Cupriavidus necator N-
937 938 YP 004685391.1 1]
Hypothetical Protein RRSL_01608
LigD
[Ralstonia solanacearum UW551]
>reflYP_002259522.11oxidoreductase
dehydrogenase protein [Ralstonia
solanacearum IP01609] >gblEAP71895.11
Hypothetical Protein RRSL_01608
[Ralstonia solanacearum UW551]
>embICAQ61454.1Ioxidoreductase
dehydrogenase protein [Ralstonia
939 940 ZP 00945631.1 solanacearum IP01609]
hypothetical protein RSc1769 [Ralstonia
LigD
solanacearum GMI1000]
>embICAD15471.11 probable
oxidoreductase dehydrogenase signal
peptide protein [Ralstonia solanacearum
941 942 NP 519890.1 GM11000]
oxidoreductase dehydrogenase signal
LigD
peptide protein [Ralstonia sp. 5_7_47FAA]
>gbIEFP64736.1I oxidoreductase
dehydrogenase signal peptide protein
943 944 ZP 07676733.1 [Ralstonia sp. 5 7 47FAA]
oxidoreductase dehydrogenase [Ralstonia
LigD
solanacearum PSI07] >embICBJ51176.11
putative oxidoreductase dehydrogenase
945 946 YP 003752456.1 [Ralstonia solanacearum PS107]
-131

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
hypothetical protein PP1Y_AT3242
LigD
[Novosphingobium sp. PP1Y]
>embICCA91281.11 conserved hypothetical
947 948 YP 004533099.1 protein [Novosphingobium sp. PP1Y]
hypothetical protein Daci_3363 [Delftia
LigD
acidovorans SPH-1] >gbIABX36001.1Ishort-
chain dehydrogenase/reductase SDR
949 950 YP 001564386.1 [Delftia acidovorans SPH-1]
short-chain dehydrogenase/reductase SDR LigD
[Delftia sp. Cs1-4] >gbIAEF90398.11 short-
chain dehydrogenase/reductase SDR
951 952 YP 004488753.1 [Delftia sp. Cs1-4]
short-chain dehydrogenase/reductase SDR LigD
[Pseudomonas mendocina ymp]
>gbIABP85377.11 short-chain
dehydrogenase/reductase SDR
953 954 YP 001188109.1 [Pseudomonas mendocina ymp]
short-chain dehydrogenase/reductase SDR LigD
955 956 ADP99633.1 [Marinobacter adhaerens HP15]
short-chain dehydrogenase/reductase family LigD
protein [Alcanivorax borkumensis 5K2]
>embICAL17366.11 short-chain
dehydrogenase/reductase family
957 958 YP 693638.1 [Alcanivorax borkumensis 5K2]
short-chain dehydrogenase/reductase SDR LigD
[Cupriavidus metallidurans CH34]
>gbIABF10471.11 short-chain
dehydrogenase/reductase SDR [Cupriavidus
short-chain dehydrogenase/reductase SDR LigD
[Comamonas testosteroni CNB-2]
>gblACY32473.11 short-chain
dehydrogenase/reductase SDR
961 962 YP 003277769.1 [Comamonas testosteroni CNB-2]
-132-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
PROTEIN GENE GENBANK DESCRIPTION:
TYPE
SEQ ID SEQ ID ACCESSION NO:
NO: NO:
hypothetical protein HGR_11311
LigD
[Hylemonella gracilis ATCC 19624]
>gblEG176405.11 hypothetical protein
HGR 11311 [Hylemonella gracilis ATCC
963 964 ZP 08406457.1 19624]
short-chain dehydrogenase/reductase SDR LigD
[Clostridium cellulovorans 743B]
>refIZP_07632312.1I short-chain
dehydrogenase/reductase SDR [Clostridium
cellulovorans 743B] >gbIADL50757.11 short-
chain dehydrogenase/reductase SDR
965 966 YP 003842521.1 [Clostridium cellulovorans 743B]
short-chain dehydrogenase/reductase SDR LigD
[Comamonas testosteroni S44]
>gbIEF162855.1I short-chain
dehydrogenase/reductase SDR
967 968 ZP 07043693.1 [Comamonas testosteroni S44]
hypothetical protein Reut_A1415 [Ralstonia LigD
eutropha JMP134] >gbIAAZ60785.1I Short-
chain dehydrogenase/reductase SDR
969 970 YP_295629.1 [Ralstonia eutropha JMP134]
putative oxidoreductase dehydrogenase
LigD
971 972 CBJ37979.1 [Ralstonia solanacearum CMR15]
short-chain dehydrogenase/reductase sdr
LigD
[Variovorax paradoxus EPS]
>gbIADU37360.1I short-chain
dehydrogenase/reductase SDR [Variovorax
973 974 YP 004155471.1 paradoxus EPS]
hypothetical protein mma_1991
LigD
[Janthinobacterium sp. Marseille]
>gbIABR91341.11 short-chain
dehydrogenase/reductase SDR
975 976 YP 001353681.1 [Janthinobacterium sp. Marseille]
-133-

CA 02811403 2013-03-14
WO 2012/036884 PCT/US2011/049619
[00259] Those skilled in the art will recognize, or be able to ascertain
using no more than
routine experimentation, that there are many equivalents to the specific
embodiments
described herein that have been described and enabled to the extent that one
of skill in the
art can practice the invention well-beyond the scope of the specific
embodiments taught
herein. Such equivalents are intended to be encompassed by the following
claims. In
addition, there are numerous lists and Markush groups taught and claimed
herein. One of
skill will appreciate that each such list and group contains various species
and can be
modified by the removal, or addition, of one or more of species, since every
list and group
taught and claimed herein may not be applicable to every embodiment feasible
in the
practice of the invention. As such, components in such lists can be removed
and are
expected to be removed to reflect some embodiments taught herein. All
publications,
patents, patent applications, other references, accession numbers, ATCC
numbers, etc.,
mentioned in this application are herein incorporated by reference into the
specification to
the same extent as if each was specifically indicated to be herein
incorporated by reference
in its entirety.
-134-

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

2024-08-01:As part of the Next Generation Patents (NGP) transition, the Canadian Patents Database (CPD) now contains a more detailed Event History, which replicates the Event Log of our new back-office solution.

Please note that "Inactive:" events refers to events no longer in use in our new back-office solution.

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Event History , Maintenance Fee  and Payment History  should be consulted.

Event History

Description Date
Application Not Reinstated by Deadline 2017-08-29
Time Limit for Reversal Expired 2017-08-29
Inactive: Abandon-RFE+Late fee unpaid-Correspondence sent 2016-08-29
Deemed Abandoned - Failure to Respond to Maintenance Fee Notice 2016-08-29
Inactive: Compliance - PCT: Resp. Rec'd 2015-01-20
Inactive: Sequence listing - Amendment 2015-01-20
BSL Verified - Defect(s) 2015-01-20
BSL Verified - No Defects 2015-01-20
Inactive: Incomplete PCT application letter 2014-11-26
Inactive: Cover page published 2013-05-27
Inactive: Notice - National entry - No RFE 2013-04-16
Application Received - PCT 2013-04-16
Inactive: First IPC assigned 2013-04-16
Inactive: IPC assigned 2013-04-16
Inactive: IPC assigned 2013-04-16
Inactive: IPC assigned 2013-04-16
Inactive: IPC assigned 2013-04-16
Inactive: IPC assigned 2013-04-16
Inactive: Sequence listing - Refused 2013-03-14
Inactive: Sequence listing - Received 2013-03-14
National Entry Requirements Determined Compliant 2013-03-14
Application Published (Open to Public Inspection) 2012-03-22

Abandonment History

Abandonment Date Reason Reinstatement Date
2016-08-29

Maintenance Fee

The last payment was received on 2015-08-31

Note : If the full payment has not been received on or before the date indicated, a further fee may be required which may be one of the following

  • the reinstatement fee;
  • the late payment fee; or
  • additional fee to reverse deemed expiry.

Patent fees are adjusted on the 1st of January every year. The amounts above are the current amounts if received by December 31 of the current year.
Please refer to the CIPO Patent Fees web page to see all current fee amounts.

Fee History

Fee Type Anniversary Year Due Date Paid Date
Basic national fee - standard 2013-03-14
MF (application, 2nd anniv.) - standard 02 2013-08-29 2013-08-20
MF (application, 3rd anniv.) - standard 03 2014-08-29 2014-08-29
2015-01-20
MF (application, 4th anniv.) - standard 04 2015-08-31 2015-08-31
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
ALIGNA TECHNOLOGIES, INC.
Past Owners on Record
GARY Y. LIU
KENNETH MITCHELL
KENNETH ZAHN
RANJINI CHATTERJEE
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column (Temporarily unavailable). To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Description 2013-03-13 134 5,704
Drawings 2013-03-13 12 535
Claims 2013-03-13 15 541
Abstract 2013-03-13 2 103
Representative drawing 2013-05-26 1 50
Cover Page 2013-05-26 2 89
Description 2015-01-19 134 5,704
Reminder of maintenance fee due 2013-04-29 1 114
Notice of National Entry 2013-04-15 1 196
Reminder - Request for Examination 2016-05-01 1 126
Courtesy - Abandonment Letter (Request for Examination) 2016-10-10 1 164
Courtesy - Abandonment Letter (Maintenance Fee) 2016-10-10 1 172
PCT 2013-03-13 11 394
Fees 2014-08-28 1 26
Correspondence 2014-11-25 2 43
Correspondence 2015-01-19 2 59
Fees 2015-08-30 1 26

Biological Sequence Listings

Choose a BSL submission then click the "Download BSL" button to download the file.

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Please note that files with extensions .pep and .seq that were created by CIPO as working files might be incomplete and are not to be considered official communication.

BSL Files

To view selected files, please enter reCAPTCHA code :