+ All Categories
Home > Documents > Journal of Biotechnology - COnnecting REpositories · for High Throughput Biology, Canada f...

Journal of Biotechnology - COnnecting REpositories · for High Throughput Biology, Canada f...

Date post: 19-Apr-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
10
Journal of Biotechnology 167 (2013) 462–471 Contents lists available at ScienceDirect Journal of Biotechnology jou rn al hom ep age: www.elsevier.com/locate/jbiotec Biomining active cellulases from a mining bioremediation system Keith Mewis a , Zachary Armstrong a , Young C. Song b , Susan A. Baldwin c , Stephen G. Withers a,d,e , Steven J. Hallam a,b,f,a Genome Science and Technology Program, Canada b UBC Graduate Program in Bioinformatics, Canada c Department of Chemical and Biological Engineering, Canada d Department of Chemistry, Canada e Centre for High Throughput Biology, Canada f Department of Microbiology & Immunology, Canada a r t i c l e i n f o Article history: Received 1 May 2013 Received in revised form 8 July 2013 Accepted 11 July 2013 Available online 29 July 2013 Keywords: Functional screening Metagenomics Cellulase Mining bioreactor a b s t r a c t Functional metagenomics has emerged as a powerful method for gene model validation and enzyme discovery from natural and human engineered ecosystems. Here we report development of a high- throughput functional metagenomic screen incorporating bioinformatic and biochemical analyses features. A fosmid library containing 6144 clones sourced from a mining bioremediation system was screened for cellulase activity using 2,4-dinitrophenyl -cellobioside, a previously proven cellulose model substrate. Fifteen active clones were recovered and fully sequenced revealing 9 unique clones with the ability to hydrolyse 1,4--d-glucosidic linkages. Transposon mutagenesis identified genes belonging to glycoside hydrolase (GH) 1, 3, or 5 as necessary for mediating this activity. Reference trees for GH 1, 3, and 5 families were generated from sequences in the CAZy database for automated phylogenetic anal- ysis of fosmid end and active clone sequences revealing known and novel cellulase encoding genes. Active cellulase genes recovered in functional screens were subcloned into inducible high copy plasmids, expressed and purified to determine enzymatic properties including thermostability, pH optima, and substrate specificity. The workflow described here provides a general paradigm for recovery and char- acterization of microbially derived genes and gene products based on genetic logic and contemporary screening technologies developed for model organismal systems. © 2013 The Authors. Published by Elsevier B.V. All rights reserved. 1. Introduction Cellulose, one of the most abundant sources of organic car- bon on the planet, has wide-ranging industrial applications, with increasing emphasis on biofuel production. As a result, many studies have extensively focused on the identification of carbohydrate active enzymes, or CAZymes, using both culture- dependent (Rastogi et al., 2009) and culture-independent (Xia et al., 2013) methods. The CAZy database (http://www.cazy.org) cur- rently defines 131 families of glycoside hydrolases (GHs) based on sequence and structure providing a useful resource for functional This is an open-access article distributed under the terms of the Creative Com- mons Attribution-NonCommercial-No Derivative Works License, which permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are credited. Corresponding author at: University of British Columbia, Department of Micro- biology & Immunology, Life Sciences Institute, 2552-2350 Health Sciences Mall, Vancouver, British Columbia V6T1Z3, Canada. Tel.: +1 604 827 3420; fax: +1 604 822 6041. E-mail address: [email protected] (S.J. Hallam). annotation of predicted GH genes (Cantarel et al., 2009). Seventeen of these families are reported to have cellulase activity, classified by their ability to hydrolyse 1,4--d-glucosidic linkages found in cel- lulose, lichenan, and cereal -d-glucans. The current production of cellulosic ethanol from non-feedstock crops typically utilizes enzymatic hydrolysis steps to break cellulose into its constituent sugars prior to fermentation (Brethauer and Wyman, 2010). The current high cost of versatile industrial enzymes is a limiting fac- tor in this production (Lee et al., 2010), necessitating the discovery or development of new enzymes that may show more desirable attributes conducive to current cellulosic ethanol pipelines, such as improved acid and temperature stability. Many organisms have been enriched and cultured with this intention, but the biggest reservoir of microbial diversity remains uncultured and untapped within natural and human engineered ecosystems. To address this cultivation gap, functional metagenomic screens have been developed to recover active genes sourced directly from environmental samples (reviewed in Taupp et al. (2011)). While the discovery of many different enzyme classes has been reported, cellulases have been among the most sought after genes from a biotechnological perspective. Functional metagenomic screens to 0168-1656/$ see front matter © 2013 The Authors. Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jbiotec.2013.07.015
Transcript

B

KSa

b

c

d

e

f

a

ARRAA

KFMCM

1

bwmcd2rs

mno

bVf

0h

Journal of Biotechnology 167 (2013) 462– 471

Contents lists available at ScienceDirect

Journal of Biotechnology

jou rn al hom ep age: www.elsev ier .com/ locate / jb io tec

iomining active cellulases from a mining bioremediation system�

eith Mewisa, Zachary Armstronga, Young C. Songb, Susan A. Baldwinc,tephen G. Withersa,d,e, Steven J. Hallama,b,f,∗

Genome Science and Technology Program, CanadaUBC Graduate Program in Bioinformatics, CanadaDepartment of Chemical and Biological Engineering, CanadaDepartment of Chemistry, CanadaCentre for High Throughput Biology, CanadaDepartment of Microbiology & Immunology, Canada

r t i c l e i n f o

rticle history:eceived 1 May 2013eceived in revised form 8 July 2013ccepted 11 July 2013vailable online 29 July 2013

eywords:unctional screeningetagenomics

ellulase

a b s t r a c t

Functional metagenomics has emerged as a powerful method for gene model validation and enzymediscovery from natural and human engineered ecosystems. Here we report development of a high-throughput functional metagenomic screen incorporating bioinformatic and biochemical analysesfeatures. A fosmid library containing 6144 clones sourced from a mining bioremediation system wasscreened for cellulase activity using 2,4-dinitrophenyl �-cellobioside, a previously proven cellulose modelsubstrate. Fifteen active clones were recovered and fully sequenced revealing 9 unique clones with theability to hydrolyse 1,4-�-d-glucosidic linkages. Transposon mutagenesis identified genes belonging toglycoside hydrolase (GH) 1, 3, or 5 as necessary for mediating this activity. Reference trees for GH 1, 3,and 5 families were generated from sequences in the CAZy database for automated phylogenetic anal-

ining bioreactor ysis of fosmid end and active clone sequences revealing known and novel cellulase encoding genes.Active cellulase genes recovered in functional screens were subcloned into inducible high copy plasmids,expressed and purified to determine enzymatic properties including thermostability, pH optima, andsubstrate specificity. The workflow described here provides a general paradigm for recovery and char-acterization of microbially derived genes and gene products based on genetic logic and contemporaryscreening technologies developed for model organismal systems.

. Introduction

Cellulose, one of the most abundant sources of organic car-on on the planet, has wide-ranging industrial applications,ith increasing emphasis on biofuel production. As a result,any studies have extensively focused on the identification of

arbohydrate active enzymes, or CAZymes, using both culture-ependent (Rastogi et al., 2009) and culture-independent (Xia et al.,

013) methods. The CAZy database (http://www.cazy.org) cur-ently defines 131 families of glycoside hydrolases (GHs) based onequence and structure providing a useful resource for functional

� This is an open-access article distributed under the terms of the Creative Com-ons Attribution-NonCommercial-No Derivative Works License, which permits

on-commercial use, distribution, and reproduction in any medium, provided theriginal author and source are credited.∗ Corresponding author at: University of British Columbia, Department of Micro-

iology & Immunology, Life Sciences Institute, 2552-2350 Health Sciences Mall,ancouver, British Columbia V6T1Z3, Canada. Tel.: +1 604 827 3420;

ax: +1 604 822 6041.E-mail address: [email protected] (S.J. Hallam).

168-1656/$ – see front matter © 2013 The Authors. Published by Elsevier B.V. All rights ttp://dx.doi.org/10.1016/j.jbiotec.2013.07.015

© 2013 The Authors. Published by Elsevier B.V. All rights reserved.

annotation of predicted GH genes (Cantarel et al., 2009). Seventeenof these families are reported to have cellulase activity, classified bytheir ability to hydrolyse 1,4-�-d-glucosidic linkages found in cel-lulose, lichenan, and cereal �-d-glucans. The current productionof cellulosic ethanol from non-feedstock crops typically utilizesenzymatic hydrolysis steps to break cellulose into its constituentsugars prior to fermentation (Brethauer and Wyman, 2010). Thecurrent high cost of versatile industrial enzymes is a limiting fac-tor in this production (Lee et al., 2010), necessitating the discoveryor development of new enzymes that may show more desirableattributes conducive to current cellulosic ethanol pipelines, suchas improved acid and temperature stability. Many organisms havebeen enriched and cultured with this intention, but the biggestreservoir of microbial diversity remains uncultured and untappedwithin natural and human engineered ecosystems.

To address this cultivation gap, functional metagenomic screenshave been developed to recover active genes sourced directly from

environmental samples (reviewed in Taupp et al. (2011)). Whilethe discovery of many different enzyme classes has been reported,cellulases have been among the most sought after genes from abiotechnological perspective. Functional metagenomic screens to

reserved.

otechn

ismgusaap

rsrcbs(of3cos

2

2

(ftfpsc2wSkMwap

2

uCTtcasaWamcwtSth

K. Mewis et al. / Journal of Bi

dentify novel cellulases have been conducted on environmentalamples from soils (Nacke et al., 2012; Voget et al., 2006), guticrobiomes (Pope et al., 2010; Warnecke et al., 2007) and a bio-

as plant (Ilmberger et al., 2012). Previously we reported a methodsing a 384-well plate format to allow for increased throughput andcalability of liquid-assay screens (Mewis et al., 2011). This formatllows for the incorporation of automated liquid handling systemsnd a quantitative readout for accurate comparison between sam-les.

Here we build upon this automated screening paradigm toecover and characterize active cellulase-encoding fosmid clonesourced from a biochemical reactor (BCR) system designed foremediation of metal contaminated water. The bioreactor used aellulosic substrate or feedstock composed mostly of pulp milliosolids to reduce or remove arsenic, cadmium, and zinc frommelter waste seepage through both biotic and abiotic processesKawaja et al., 2005; Mattes et al., 2011). We present the devel-pment and application of an automated tree-building pipelineor phylogenetic assignment of discovered genes within GH 1,

and 5 families and demonstrate an efficient purification andharacterization process to constrain the biochemical propertiesf active clones for biomass transformation within the bioreactorystem.

. Materials and methods

.1. Fosmid library construction

A fosmid library was constructed using high molecular weightHMW) DNA extracted from a homogenized core sample derivedrom a BCR (site details in Supp. Info). This bioreactor opera-es year round at temperatures ranging from 0 ◦C to 18 ◦Crom August 2008 to July 2009 when the reactor was sam-led. Environmental DNA was cloned into the pCC1 copy controlystem using Escherichia coli EPI300 as expression host (Epi-entre, Madison, WI) as previously described (Taupp et al.,009). Library construction yielded 6144 fosmid harbouring clonesith average insert size of 42 kilobase (kb) pairs. Bi-directional

anger end-sequencing was performed using the ABI BigDyeit (Applied Biosystems, Carlsbad, Ca) on all clones at Canada’sichael Smith Genome Sciences Centre, Vancouver, BC, Canadaith the pCC1 forward (5′-GGATGTGCTGCAAGGCGATTAAGTTGG)

nd reverse (5′-CTCGTATGTTGTGTGGAATTGTGAGC) sequencingrimer.

.2. High throughput functional screening

Cellulase activities were assayed in 384-well culture platessing 2,4-dinitrophenyl �-cellobioside (gift from Dr. Hongminghen, Withers Lab, UBC) as reported previously (Mewis et al., 2011).he DNP group is a desirable chromophore for this application dueo its lower pKa value than the commonly used p-nitrophenol (pNP)hromophore, allowing the assay to be conducted at full sensitivityt the environmental pH of 6.9. The assay temperature was cho-en as 37 ◦C as this was the optimal temperature for E. coli growth,nd hence the temperature at which the enzymes were expressed.ells were considered cellulase active if they exhibited 400 nm

bsorbance more than six standard deviations above the sampleean of all plates. A 96-well glycerol stock plate containing six

opies of each active clone was created, and a copy of this plate

as screened in the same manner to ensure there were no loca-

ion or volume effects contributing to the hydrolysis of substrate.creening of this plate enabled a quantitative comparison betweenhe activities of individual clones without solution or plate-locationeterogeneity.

ology 167 (2013) 462– 471 463

2.3. Full fosmid sequencing

Once active clones were identified, fosmid DNA was extractedusing the FosmidMax DNA preparation kit (Epicentre) accord-ing to the manufacturer’s instructions, and further treated withPlasmidSafe DNase (Epicentre) to remove contamination E. colichromosomal DNA. DNA concentrations were measured usingQuant-iT PicoGreen (Invitrogen, Carlsbad, CA). For full fosmidsequencing, 500 ng of each fosmid was sent to Canada’s MichaelSmith Genome Sciences Centre (Vancouver, BC). In order to maxi-mize sequencing throughput, 92 individual fosmid samples (not allreported here) were barcoded and sequenced on a single lane of anIllumina GAIIx sequencer (Illumina, San Diego, CA). Contigs wereassembled for each well using the barcoded sequences with ABySSv1.2 (Simpson et al., 2009).

2.4. Transposon mutagenesis

For each active clone, a Tn5 transposon mutagenesis librarywas created (EZ-Tn5 kan insertion kit, Epicentre) to identify whichgene on the insert was responsible for detected activity. The regionsurrounding the insertion was sequenced using an automatedDNA sequencer (Applied Biosystems 3730 system, Carlsbad, CA)and primers complementary to the inserted transposon. Sequencedata was assembled using phred/phrap (http://www.phrap.org/phredphrapconsed.html) (Ewing et al., 1998), and Consed wasused to examine and export resulting contigs into fasta file format(http://www.phrap.org/consed/consed.html) (Gordon et al., 1998).Assembled contigs from these Sanger sequences ranged in size from382 bp to 5312 bp in length.

A custom perl script was designed that uses Circos (Krzywinskiet al., 2009) to visualize contig relationships between transposoncontigs and full sequences. This was necessary to remove “hitch-hiking” sequences originating from other wells during the libraryconstruction process. Transposon data for one well (F1) localized tocontigs from two separate wells likely due to contamination duringtransposon mutagenesis and thus transposon insertion informationfrom this fosmid was limited.

2.5. Gene finding and open reading frame prediction

Open reading frames (ORFs) were determined for endsequences, full fosmid sequences and transposon contigs using themetagenome option of Prodigal (http://prodigal.ornl.gov/) (Hyattet al., 2010). End sequences yielded 17,648 predicted ORFs, fullfosmid sequencing yielded 623 predicted ORFs and transposonmutagenesis contigs yielded 169 predicted ORFs. These ORFs werecompared to the CAZy protein database using BLASTP (Altschulet al., 1997) with an expectation value cutoff of 1 × e−20 using crite-ria reported by Martinez et al. (2010).

2.6. Generating reference datasets for GH families 1, 3, and 5

All protein sequences from GH 1 (1630 sequences), GH 3 (2297sequences), and GH 5 (875 sequences) families were downloadedfrom the CAZy database in August 2011. Sequences were filtered bylength (see supplementary methods) and clustered with UCLUST(Edgar, 2010), representative sequences were aligned with MUS-CLE (Edgar, 2004), and then inserted into RAxML (Stamatakis,2006). Hidden Markov models were generated using hmmbuild(Eddy, 1998). Complete reference datasets include alignment, hid-

den markov model, and the phylogenetic tree. Following thegeneration of reference data sets, each family was appended tothe MLTreeMap package as a set of functional marker genes.The phylogeny of identified GH 1, GH 3 and GH 5 genes was

464 K. Mewis et al. / Journal of Biotechnology 167 (2013) 462– 471

Table 1Substrate specificity of expressed proteins, measured at 100 �M substrate. The maximum observed absorbance change per unit time for each protein was given a 100 percentvalue and all others are expressed as a percentage of this value.

Substrate D2 GH1 F2 GH1 H1 GH3 A1 GH3 D1 GH3 E1 GH5 B1 GH5 B2 GH5 H2 GH5

pNP �-Cellobioside – 22.8 28.6 – – 100.0 11.7 16.7 22.6pNP �-d-Glucopyranoside 100.0 100.0 100.0 100.0 100.0 56.0 28.3 3.1 2.8pNP �-Lactoside 2.1 1.5 – – 1.9 – 76.7 100.0 100.0pNP �-d-Galactopyranoside 27.1 – 1.6 1.0 – 28.0 100.0 9.2 1.0pNP �-d-Fucopyranoside 70.8 42.7 – – – 68.0 15.8 1.3 –pNP �-d-Xylopyranoside 2.1 2.6 – – – 2.0 – – –pNP �-L-Rhamnofuranoside 2.1 – – – – – – – –pNP �-d-Mannopyranoside – 9.1 – – – – – – –

p

r(

2

aatS

2

llp1(e

2

2�t(vt5tcpp[

TE

pNP �-l-Arabinofuranoside – – –

NP: p-Nitrophenyl.

econstructed with MLTreeMap and visualized using FigTreehttp://tree.bio.ed.ac.uk/software/figtree/).

.7. DNA manipulation and protein expression

Briefly, genes were subcloned into a pET28a vector (Novagen)nd expressed in E. coli BL21(DE3). Proteins were purified using

His-Bind Resin (Novagen), and concentration determined withhe Micro BCA assay (Thermo Scientific). For further details, seeupplementary methods.

.8. Substrate specificity

Each purified enzyme was tested for activity on nine substratesisted in Table 1. Purified enzyme was added at concentrations asisted in Table 2 to a solution of 100 �M substrate, 50 mM sodiumhosphate, 100 mM NaCl. These assays were incubated at 37 ◦C for

h then stopped by the addition of 1 equiv. volume of 1 M glycinepH 10.0). Absorbance at 400 nm was detected with a BioTek Syn-rgy H1 plate reader.

.9. pH dependence of kcat/Km

The kcat/Km values for each enzyme were determined using,4-dinitrophenyl �-glucopyranoside (DNPG) or 2,4-dinitrophenyl-cellobioside (DNPC). The hydrolysis of DNPC or DNPG was moni-

ored at 400 nm for each enzyme using substrate depletion kineticsVocadlo et al., 2002). kcat/Km values were determined at each pHalue from progress curves at low substrate concentrations as men-ioned in the supplement. The progress curves were fit using GraFit.0 software to the first-order rate equation A(t) = Ainf(1 − e−kt + c),o yield the pseudo-first-order rate constants. Division of rate

onstants by enzyme concentration gave kcat/Km values. TheKa values were assigned by fitting the kcat/Km versus pHlots to the equation: kcat/Km = (kcat/Km)max((Ka1[H+])/((Ka1 +H+])(Ka2 + [H+]))) with GraFit.

able 2nzyme Characteristics for all expressed proteins.

Enzyme [enzyme] (nM) Substrate Assay temperature (◦C) optima

D2 GH1 100 DNPG 37 5.5

F2 GH1 10 DNPG 37 5.0

A1 GH3 1 DNPG 37 6.5

D1 GH3 10 DNPG 37 5.0

H1 GH3 10 DNPG 37 5.5

B1 GH5 100 DNPC 30 5.5

B2 GH5 100 DNPC 30 6.0

E1 GH5 1 DNPC 27 5.5

H2 GH5 100 DNPC 30 5.5

– – – – –

2.10. Thermal stability

To assess thermal stability of each enzyme, rate constants weredetermined from substrate depletion kinetics as in section 2.9 afterincubation at a range of temperatures. Division of pseudo-first-order rate constants by the maximum kcat/Km values gave theconcentration of active enzyme ([E]). The ratio of active enzymeto total enzyme ([E]/[Eo]) was then plotted against temperatureand fit to a derivative of the Van’t Hoff equation: [E]/[E0] = 1/(1 +(e((�So

D/R)−(�HoD/RT)))) with GraFit. Melting points (Tm) were deter-

mined from the obtained values with the equation Tm = �HoD/�So

D

2.11. Sequences

Sequences of cellulase active clones were deposited in GenBankunder the accession numbers JN695666–JN695680 (Table 3).

3. Results

3.1. Functional screening for active clones

To identify environmental clones conferring cellulase activityfrom the bioreactor, a fosmid library was constructed using E. coliEPI300 as expression host. A total of 6144 clones were picked andarrayed into sixteen 384-well plates, which were then screened asin Section 2.2. The 2,4-dinitrophenyl �-cellobioside (DNPC) sub-strate used in this screen specifically identified enzymes with thecapacity to hydrolyse � 1,4-glucose linkages (Figure S1), releas-ing a colorimetric DNP group absorbing strongly at 400 nm. Thisassay is able to detect endoglucanases and exoglucanases, as wellas �-glucosidases. The low pKa of the 2,4-dinitrophenol releasedboth endows the substrate with high reactivity and ensures thatthe product is fully ionized and maximally absorbing under a widerange of pH conditions. Functional screening yielded 15 active

cellulase clones, a recovery rate of 0.24%, or 1 per 410 clones(Fig. 1). To compare the effectiveness of the in-house lysis mixversus a commercial alternative, the library was re-screened using10× BugBuster protein extraction reagent (Novagen, Whitehouse

l pH Maximum (kcat/Km) (M−1 s−1) pKa1 pKa2 Tm (◦C)

1.19 × 105 ± 1.1 × 103 4.3 ± 0.1 6.5 ± 0.1 381.41 × 105 ± 2.9 × 103 4.6 ± 0.2 5.7 ± 0.2 461.32 × 105 ± 5.7 × 102 5.2 ± 0.1 8.8 ± 0.2 465.8 × 103 ± 2.8 × 10 4.2 ± 0.1 5.9 ± 0.1 471.46 × 105 ± 5.5 × 102 4.1 ± 0.1 7.4 ± 0.1 477.76 × 103 ± 4.1 × 10 4.9 ± 0.2 7.0 ± 0.2 432.1 × 104 ± 1.8 × 102 4.6 ± 0.2 6.8 ± 0.2 458.4 × 103 ± 6.6 × 10 – 6.4 ± 0.2 421.1 × 104 ± 5.9 × 10 4.7 ± 0.1 6.3 ± 0.2 53

K. Mewis et al. / Journal of Biotechnology 167 (2013) 462– 471 465

Table 3GenBank accession numbers and top BLASTP hits of identified GH genes based on comparison to the CAZy database. GH 5 genes also list their subfamily according to (Aspeborget al., 2012).

Fosmid Clone Accession Number GH Family Percent identity/similarity Top BLAST hit from CAZy

A1 JN695666 29 65/79 Alpha-L-fucosidase [Pedobacter heparinus DSM 2366]16 51/66 Licheninase [Bacteroides helcogenes P 36-108]3 60/75 glycoside hydrolase family 3[Bacteroides helcogenes P 36-108]

B1 JN695668 5 sub 25 46/63 glycoside hydrolase family 5 [Thermotoga sp. RQ2]D1 JN695671 16 62/74 Licheninase [Bacteroides helcogenes P 36-108]

3 71/82 glycoside hydrolase family 3 [Bacteroides helcogenes P 36-108]E1 JN695673 5 sub NA 39/60 Endoglucanase [Coprococcus sp. ART55/1]F1 JN695675 3 60/75 glycoside hydrolase family 3 [Bacteroides helcogenes P 36-108]

16 51/66 Licheninase [Bacteroides helcogenes P 36-108]29 65/79 Alpha-L-fucosidase [Pedobacter heparinus DSM 2366]

G1 JN695677 16 51/66 Licheninase [Bacteroides helcogenes P 36-108]3 60/75 glycoside hydrolase family 3 [Bacteroides helcogenes P 36-108]55 32/47 Multifunctional Gluconolactonase containing protein [Opitutus

terrae PB90-1]H1 JN695679 3 56/70 beta-glucosidase [Robiginitalea biformata HTCC2501]

106 41/61 glycoside hydrolase family 2 [Opitutus terrae PB90-1]A2 JN695667 30 54/69 glycoside hydrolase [Zunongwangia profunda SM-A87]

2 39/56 Beta-mannosidase [Asticcacaulis excentricus CB 48]30 64/80 Multifunctional glycoside hydrolase family 30 domain, glycoside

hydrolase family 13 binding domain [Bacteroides vulgatus ATCC8482]

5 sub 25 46/64 endoglucanase [Anaerolinea thermophila UNI-1]65 41/58 Multifunctional glycoside hydrolase family 65 containing protein

[Bifidobacterium bifidum S17]B2 JN695669 65 41/58 Multifunctional glycoside hydrolase family 65 containing protein

[Bifidobacterium bifidum S17]5 sub 25 46/64 endoglucanase [Anaerolinea thermophila UNI-1]30 70/83 Multifunctional glycoside hydrolase family 30 domain, glycoside

hydrolase family 13 binding domain [Bacteroides vulgatus ATCC8482]

2 39/56 Beta-mannosidase [Asticcacaulis excentricus CB 48]30 54/69 glycoside hydrolase [Zunongwangia profunda SM-A87]

C2 JN695670 3 60/75 glycoside hydrolase family 3 [Bacteroides helcogenes P 36-108]16 51/66 Licheninase [Bacteroides helcogenes P 36-108]

D2 JN695672 1 47/61 glycoside hydrolase family 1 [Halothermothrix orenii H 168]95 29/46 Predicted alpha-fucosidase A [Aspergillus nidulans FGSC A4]

E2 JN695674 16 62/74 Licheninase [Bacteroides helcogenes P 36-108]3 71/82 glycoside hydrolase family 3 [Bacteroides helcogenes P 36-108]

F2 JN695676 1 45/60 beta-glucosidase [Oryza sativa Japonica Group]G2 JN695678 5 sub 25 46/63 glycoside hydrolase family 5 [Thermotoga sp. RQ2]

43 34/53 Multifunctional glycoside hydrolase family 43 containing protein[Rhodopirellula baltica SH 1]

27 45/61 Alpha-galactosidase [Paludibacter propionicigenes WB4]43 29/49 Multifunctional glycoside hydrolase family 43 containing protein

[Rhodopirellula baltica SH 1]H2 JN695680 1 32/49 Multifunctional glycoside hydrolase family 1 containing protein

SybafsCi(

fj

3

cSco

20 57/70

5 sub 25 45/65

tation, NJ) in place of lysis buffer. Screening with BugBusterielded 12 active clones, only one of which (clone E1) had noteen identified previously. This additional clone was included inll further downstream analysis. Following functional screening,osmids from active clones were fully sequenced. Across all fullyequenced fosmids, 623 ORFs were predicted and compared to theAZy database, identifying 41 GH genes from 15 separate fam-

lies. Each active clone harboured a GH 1, GH 3, or GH 5 geneFig. 2).

Supplementary data associated with this article can beound, in the online version, at http://dx.doi.org/10.1016/.jbiotec.2013.07.015.

.2. Tn5 transposon mutagenesis

To determine the specific gene or coding region necessary for

ellulase activity, transposon mutagenesis was performed as inection 2.4. For each transposon insertion event, sequences wereompared to completed fosmid sequences to generate a histogramf transposon insertion locations (Fig. 2). Contigs from transposon

[Cyanothece sp. PCC 7822]N-acetyl-beta-hexosaminidase [Bacteroides xylanisolvens XB1A]endoglucanase [Anaerolinea thermophila UNI-1]

sequencing contained 27 GH genes from 7 GH families. Each fosmidharboured at least one GH gene and insertion events localized to agenomic interval containing a GH 1, GH 3, or GH 5 gene.

For clone A2, 768 transposon clones were picked irrespectiveof activity to determine how much Tn5 coverage is required to becomparable to that obtained with Illumina index sequencing. Forthis clone, 1238 individual Sanger reads assembled into two largecontigs, 23,541 and 11,805 bp in length. These clones aligned tothe Illumina sequenced fosmid with a 1796 bp gap in between. Theper-base coverage transposon sequencing was 24×, resulting in acost of $1600 per Mb. The per-base coverage for the indexed Illu-mina method was 1875×, resulting in a cost of $1.4 per Mb. Withtime considerations approximately equal, this comparison showsthe indexed Illumina method to be superior from both coverageand cost point perspectives.

3.3. Taxonomic identification of active clones

To determine the taxonomic affiliation of active clones, pre-dicted ORFs from each completed fosmid sequence were compared

466 K. Mewis et al. / Journal of Biotechn

Fig. 1. 400 nm absorbance measurements of 6144 clones assayed using reportedlysis method. Black dotted line indicates six standard deviations above the meanabsorbance (0.89), green dotted line represents mean absorbance of all plates(0.55). Fifteen of the sixteen cellulase-positive clones are seen above the black line,signaling degradation of the DNP-cellobioside substrate. One additional clone wasidentified using a commercial lysis mix (Bugbuster, Novagen) that is not shownhi

tcailf

3

ttmlMfa(eamimbEsbact

tcGo

ere. (For interpretation of the references to color in this figure legend, the readers referred to the web version of the article.)

o the nr protein database and assigned using the MEGAN lowestommon ancestor rule (LCA) (Fig. 3). Of the 623 ORFs predictedcross all fosmids, 405 could be assigned at the class level. A major-ty of these (329) were assigned to the class Bacteroidia. Based oninkage information, 13 of 15 active clones were most likely derivedrom donor genotypes affiliated with this class.

.4. Fosmid end sequence analysis

To determine the microbial community structure of the bioreac-or, we predicted ORFs from fosmid end sequences and comparedhem to the NCBI nr protein database to obtain taxonomic infor-

ation. Of 17,648 ORFs, 9659 (55%) could be assigned at the classevel using the MEGAN LCA rule. Based on these results, the class

ethanomicrobia dominated the fosmid library (Fig. 4), accountingor 39.4% (6946) of all predicted ORFs. This class of methanogenicrchaea have been previously seen in waste treatment bioreactorsBertin et al., 2011) and are known to utilize acetate or CO2 fornergy (Ferry, 2010). The next most abundant class was Bacteroidia,ccounting for 6.2% (1094) of all predicted ORFs. This class containsany known organic matter degraders that are widely distributed

n the environment including soils (Youssef and Elshahed, 2008),arine environments (Gómez-Pereira et al., 2012), and gut micro-

iomes (Pati et al., 2011). Only 5 ORFs (<0.03%) were assigned toukaryotes, suggesting a low representation of eukaryotes in theample or a bias in the creation of the metagenomic library. Suchiases have been previously acknowledged (Liles et al., 2003; Tringend Rubin, 2005) and may be due to the genomic extraction proto-ols used or difficulties in cloning heterologous DNA fragments inhe E. coli host.

To determine the diversity of GH genes within the bioreac-

or, ORFs predicted from unassembled fosmid end sequences wereompared to the CAZy database. This search identified a total of 130H genes from 37 separate GH families (Fig. 5), representing 0.73%f all predicted ORFs. Taxonomic assignment of this gene set using

ology 167 (2013) 462– 471

the MEGAN LCA rule identified Bacteroidia as the most abundantclass, accounting for 83 ORFs, while 8 belonged to Firmicutes and 39could not be assigned at the class level. No identified GH genes wereassigned to Methanomicrobia suggesting that the dominant micro-bial community member does not contribute directly to cellulosedegradation within the bioreactor.

Of the 17 GH families identified by the CAZy database as cellu-lases (EC 3.2.1.4), end sequences contained 6 representatives from3 families; GH 5, GH 10, and GH 26. These 6 representatives allbelonged to the phyla identified by Berlemont and colleagues ascellulose degraders, Bacteroidetes (3), Firmicutes (1), Fibrobacteres(1), and Chloroflexi (1). These genes represent a suite of enzymesselected for and enriched by the initial feedstock of the BCR. Theyalso indicate the presence of a community capable of degrading avariety of polysaccharides.

3.5. Phylogenetic tree construction with GH 1, GH 3 and GH 5genes

Complete sequence sets from GH 1, GH 3 and GH 5 families weremapped onto the corresponding reference trees using MLTreeMapas in Section 2.6 to assess completeness of each reference tree.Both GH 1 and GH 3 reference trees represented the full taxonomicrange of sequences within each family with 95% (1556 of 1630) and55% (1268 of 2297) of all sequences assigned to their respectivetrees. The GH 5 tree was much less representative, with only 27%of sequences (240 of 875) assigned. This is likely due to the highvariation amongst GH 5 sequences that recently resulted in thesplitting of this family into multiple subfamilies (Aspeborg et al.,2012). All but one discovered protein belonged to subfamily 25,prompting production of a new tree containing all sequences fromsubfamily 25 (33 sequences) rooted with the well-characterizedendoglucanase III protein from Trichoderma reesei (belonging tosubfamily 5).

To determine the relative relationships of GH family genes dis-covered in end sequences and active clones with those in the CAZydatabase, a maximum likelihood (ML) tree was created for GH 1and GH 3 families, as well as subfamily 25 of GH 5 (Figures S2–S4).Discovered GH 1 genes did not group near each other, but formedclades with members of the CAZy database (Figure S2). Boot-strap values showed low support for terminal branches near ORFD2 11 GH1, suggesting the assignment is not robust. Six sequencesclustered with high (>75%) bootstrap support near ORF F2 16 GH1,suggesting these members form a clade of high similarity. Discov-ered GH3 genes grouped into 3 clusters (Figure S3), and proteinalignments showed discovered genes in the same cluster had veryhigh similarity (data not shown). Two of these clusters were mostsimilar to two different GH 3 genes found in Bacteroides helcogenes,a known anaerobic pathogen found in pig faeces encoding 100 GHgenes from 38 GH families (Pati et al., 2011). The third cluster con-tained only one gene and was most closely related to Gramellaforsetii, a marine bacteria known to degrade polymeric organic mat-ter (Bauer et al., 2006). Five of six functionally discovered GH 5genes were closely related (Figure S4), with the best match usingBLAST belonging to Anaerolinea thermophila, an anaerobic bacteriabelonging to the phylum Chloroflexi. The closest sequence on thetree belonged to Thermotoga maritima, belonging to the phylumThermotogae, isolated from hydrothermal vents, and containing thewell studied TmCel5A enzyme (Pereira et al., 2010). The sixth gene(E1 GH5) was distant from these, and was most closely related toCoprococcus sp., belonging to the phylum Firmicutes, an anaerobic

bacterium originally isolated from human faeces (Holdeman andMoore, 1974). None of the ORFs from end sequences for GH 1, GH3 or GH 5 showed close similarity to functionally discovered GHgenes.

K. Mewis et al. / Journal of Biotechnology 167 (2013) 462– 471 467

Fig. 2. Circos representation of completed fosmid sequences using Illumina indexed sequencing. Grey bars represent each fosmid, labeled A1, B1, etc. Outer numbers shows hydri re colt

fj

3

apn

cale in kilobases (kb). Coloured bars within fosmids show locations of glycosidensertions in 1 kb bins during transposon mutagenesis. Connections in the center ahan 90% similarity across intervals of more than 300 bp.

Supplementary data associated with this article can beound, in the online version, at http://dx.doi.org/10.1016/.jbiotec.2013.07.015.

.6. Substrate specificity

To determine substrate specificity, all cloned genes weressayed with nine different p-nitrophenyl glycosides (Table 1). Allurified proteins hydrolysed either a cellobioside or glucopyra-oside substrate. All proteins belonging to GH 1 and GH 3 showed

olase (GH) genes. Red histogram shows the number and locations of transposonoured by sequence similarity and show regions of nucleotide homology at greater

optimal activity on the �-d-glucopyranoside; their activity on2,4-DNP-cellobioside presumably arises from successive glucosidehydrolysis from the non-reducing terminus (shown on the left sideof Figure S1). All proteins belonging to GH 5 hydrolysed the cel-lobioside, though interestingly, only for E1 was the cellobioside thebest substrate. All the others had higher relative activities towards

either galactose or lactose. Galactose is a C-4 epimer of glucose,suggesting a tolerance for stereoisomers at this locus, as is wellknown for some GH families with cellulase activity. Many sub-strates are available for cellulase screening, each of which may

468 K. Mewis et al. / Journal of Biotechn

Fig. 3. Circos representation of taxonomic diversity at the class level for each com-pleted fosmid. The width of each connection represents the percentage of predictedOdo

isoosc

3

u2acwcat(bsMuttvt

f2

3

ti

RFs from the completed fosmid that were assigned to each class. ORFs were pre-icted using Prodigal, and MEGAN was used to map ORFs to taxa using the top 10%f matches in blastp queries of the nr database.

dentify a different subset of enzymes due to conformational andteric constraints. The fact that three enzymes exhibiting no activityn p-nitrophenyl cellobioside were identified based on hydrolysisf 2,4-dinitrophenyl cellobioside demonstrates the additional sen-itivity of DNP-cellobioside, and validates its use as an effectiveellulase screening substrate.

.7. pH dependence of specificity constants

To evaluate the dependence of kcat/Km over a range of pH val-es, a substrate depletion method was employed (Armstrong et al.,010). This method is convenient for use in 96 or 384 well format,s it does not rely on prior knowledge of either the extinction coeffi-ient (which will change with pH) or path length (which can changeith volumes or plates used) and is insensitive to errors in substrate

oncentration. pH profiles were determined for all nine proteinsnd were fitted to an equation for an enzyme activity dependent onwo essential ionisations to determine acid dissociation constantsTable 2 and Figure S5). All nine enzymes had optimal pH valuesetween 5 and 6.5, which is possibly a reflection of the pH of thecreening assay (5.65), selecting for enzymes active in this range.ost enzymes had a narrow range of activity, with pKa values <2.2

nits apart, with the exception of A1 GH3 and H1 GH3, for whichhe difference of pKa values was >3.3. A1 GH3 was also set apart inhat it was active at the highest pH of any of the enzymes. The pKa1alue for E1 GH5 protein could not be determined accurately dueo protein instability at pH values lower than 5.5.

Supplementary data associated with this article can beound, in the online version, at http://dx.doi.org/10.1016/j.jbiotec.013.07.015.

.8. Thermal stability

To evaluate thermal stability, each enzyme was incubated atemperatures between 27 ◦C and 65 ◦C for 10 min prior to perform-ng a substrate depletion assay at the temperatures listed in Table 2.

ology 167 (2013) 462– 471

Pseudo-first order rate constants were calculated which were thenused to determine the amount of active enzyme remaining. Allenzymes displayed melting points above 37 ◦C and most valueswere between 42 ◦C and 47 ◦C (Figure S6). The exceptions wereD2 GH1, which displayed a Tm of approximately 38 ◦C, and H2 GH5,which had a higher Tm than all other proteins.

Supplementary data associated with this article can befound, in the online version, at http://dx.doi.org/10.1016/j.jbiotec.2013.07.015.

4. Discussion

The functional metagenomic workflow described in this studyis effective because it leverages basic design principles includingautomation, natural enrichment, and discerning substrate selec-tion. The combination of in silico and functional screening recoversrare genes encoding important organic matter conversion pro-cesses within the BCR and points to different gene expressionpotential between end sequences and active clones associated withthe EPI300 copy control system.

Automation improves the scalability of functional metagenomicscreens to handle larger and more complex clone libraries increas-ing the likelihood of enzyme discovery. The method implementedhere reduces manual plate-handling steps compared to previousapproaches (Ferrer et al., 2005; Kim et al., 2007; Nacke et al., 2012)by leveraging automated plate preparation and signal detectionsteps. These attributes in combination with the potential to uti-lize different substrates in liquid format indicate that our screeningapproach should be extensible to a wide range of functional tar-gets. Moreover, fosmid clones identified in the screen are readilyexpressed and retain activity in E. coli, enabling efficient purifica-tion and biochemical characterization.

Selection of an appropriate environment for gene discoveryplays an important role in the recovery of functional GH genes(reviewed in (Taupp et al., 2011)). Previous research by Loganand colleagues identified cellulose hydrolysis as a limiting step inBCR performance, indicating potential enrichment of GH encodinggenes (Logan et al., 2005). The BCR used in the current study was tenyears old at the time of sampling, suggesting that the more readilydegradable carbon was depleted, and more recalcitrant celluloseand lignin compounds remained. The recovery rate of active cellu-lose genes here approached 1/410 clones, consistent with GH geneenrichment within the BCR. In comparison, functional screeningfrom other environments associated with cellulose or hemicellu-lose conversion processes revealed rates of 1/2954 in rabbit cecum(Feng et al., 2007), 1/4600 from grassland soils (Nacke et al., 2012)and 1/25,000 from compost soil (Pang et al., 2009).

While fewer genes have been identified or annotated in func-tional metagenomic studies, the genes discovered are proven tobe intact and functionally active. Thus, the added value of func-tional metagenomic screening is important to consider given thehigh profile success of shotgun sequencing and prediction stud-ies (Hess et al., 2011; Pope et al., 2010). The use of fosmid clonesfor direct screening also provides better taxonomic resolution fordiscovered genes due to linkage information. For example, theGH 5 gene found on fosmid E1 was affiliated with Coprococcussp., within the class Clostridia. Genomic context information fromflanking genes indicated an alternative donor genotype affiliatedwith the class Anaerolineae (Fig. 3). Moreover, we showed thatfunctional screening revealed a predominantly non-overlappingset of GH genes when compared to end sequencing alone. This

discrepancy could reflect biases in heterologous gene expressionin E. coli EPI300 arising from alternative codon or sigma factorusage, or insertion events arising from non-random breakpointsduring ligation. In some cases, end sequences harboured only the

K. Mewis et al. / Journal of Biotechnology 167 (2013) 462– 471 469

I1

Methanobacterium

sp. AL-21

Meth

anoco

rpuscu

lum

labre

anum

Meth

anosp

irillum

hungate

i

Meth

anopla

nus p

etro

leariu

sM

eth

anoculle

us m

aris

nig

ri

Me

tha

no

reg

ula

bo

on

ei M

eth

anosphaeru

la p

alu

str

is

Methanosarcina barkeri

Methanosarcina m

azei

Mic

roscilla

marin

a

Dyadobacte

r fe

rmenta

ns

Spirosom

a lin

guale

Bacte

roid

ete

s o

ral ta

xon

Pedobacte

r heparin

us

Pre

vote

llaPaludibacter propionicigenes

Odoribacter splanchnicus Parabacteroides johnsonii

Bacteroides coprocola

Bacteroides salanitronis

Bacteroides caccae

Alistip

es putredinis

Alistip

es sp

. HG

B5

Fla

vobacte

riaceae

Flu

viic

ola

taffensi

s

Fla

voba

cter

ia b

acte

rium

Chlo

roherp

eto

n th

ala

ssium

Pelo

dictyo

n p

haeocla

thra

tiform

eSpirochaeta smaragdinae

Spirochaeta sp. Buddy

Treponema

Acetivibrio cellulolyticus

Syntrophomonas wolfei

Clostridium

Pelotomaculum thermopropionicum

Fib

robact

er

succ

inogenes

Syntrophus aciditrophicus

Beijerinckia indica

Psychromonas

Enterobacteriaceae

Pseudom

onas syringae

Anaero

linea therm

ophila

Cyanobacte

ria

Akk

erm

ansia

muc

inip

hila

Victiva

llis v

adensis

Can

dida

tus

Clo

acam

onas

aci

dam

inov

oran

s

End Sequence Data

Full Fosmid Data

Legend:

F m ent gned t

3pncasowpg

Ff

ig. 4. Figure from iTOL representing the taxonomic diversity of predicted ORFs frohe circle reflects the percentage of ORFs from each set of sequences that were assi

′ end of a GH 1, 3 or 5 gene preventing expression of intact proteinroducts. These partial sequences were still placed into a phyloge-etic context using the automated tree building approach for directomparison with active clones and CAZy reference sequences. Thebility to automate sequence clustering and phylogenetic tree con-truction enables accurate annotation and taxonomic assignment

f big data sets based on custom database searches such as CAZyhich contains more than 130,000 glycoside hydrolase modules. Bylacing discovered genes into proper phylogenetic context, diver-ent proteins can be readily identified, allowing characterization

1 2 3 5 61

01

31

51

62

02

32

52

62

72

82

93

03

13

23

33

63

84

3

Full Fosmid DataEnd Sequences

Transposon

6

23 24

15

1 2

8

1

15

24

1 1 14

8 9

41 1

3 2

6

GH Family

ig. 5. Heatmap showing the abundance and family association of glycoside hydrolase (Gull fosmid sequences from positive clones, end sequences, and contigs assembled from t

d sequences (red) and fully sequenced cellulase positive clones (green). The size ofo each node in the tree using MEGAN.

efforts to focus on those enzymes least similar to ones in thedatabase.

The obtained temperature and pH profiles for active GH 1, GH3 and GH 5 proteins were consistent with screening conditions.All proteins showed maximum specificity constants between pH5.0 and pH 6.5, falling close to the screening pH value of 5.65.

Additionally, all proteins displayed melting points above the assaytemperature of 37 ◦C. Interestingly H2 GH5 shows high similarityto the catalytic portions of both B1 GH5 and B2 GH5, though it pos-sesses an additional C-terminal carbohydrate binding module. This

50

55

57

63

77

78

88

92

94

95

97

10

51

06

10

81

15

11

61

17

1 13

1 14 3

12

47

25 4

1 2 1 1

41

130

24

0 1 2 3 4 5 6 7 8 9+

Legend:

H) genes identified from all sequence analyses. Rows show genes identified fromransposon knockout mutagenesis.

4 otechn

dicauspsres(gtu

mittabaleaamnni

A

SG((MSo

R

A

A

A

B

B

B

B

70 K. Mewis et al. / Journal of Bi

omain may function both as a binding module and as a stabiliz-ng element in high temperatures. While the discovery of proteinsonsistent with the screening method is not surprising, it identifies

potential bias with this method, and suggests there may be morenidentified enzymes that would show activity with an alternativecreening method. GHs identified on fosmids B1, B2, and H2, dis-lay a broad range of substrate specificity. These sequences haveimilarity to TmCel5 (Pereira et al., 2010) that also displays a broadange of substrate specificity. We speculate that the ability of thesenzymes to act on C-4 epimers of glucose may enable the hydroly-is of glucose chains modified by polysaccharide monooxygenasessuch as those characterized by Beeson et al. (2011)) that producelucose chains oxidized at C-4. However, the anaerobic nature ofhe BCR indicates that such enzymes were unlikely to be activender in situ conditions.

In conclusion, this study describes the development and imple-entation of a high-throughput functional metagenomic screen

ncorporating bioinformatic and biochemical analyses moduleshat provides a general paradigm for recovery and characteriza-ion of microbially derived genes and gene products. We recovered

number of known and novel cellulase encoding genes from aioreactor system enriched for cellulose degrading phenotypesnd show that DNP-cellobioside is an effective substrate for cel-ulase screening. With this paradigm in mind future screeningfforts that incorporate flow cytometric or microfluidic approachesre needed to increase throughput and reduce freezer footprintsnd plastic consumption of current plate-based library productionethods. Moreover, alternative screening hosts including yeast

eed to be considered to increase the phylogenetic range andovelty of recovered enzymes for biotechnological applications

ncluding biorefining and enhanced bioreactor performance.

cknowledgments

This work was performed under the auspices of the Naturalciences and Engineering Research Council (NSERC) of Canada,enome British Columbia Applied Genomics Innovation Program

GBC-AGIP), Genome Canada, Canada Foundation for InnovationCFI), and the Canadian Institute for Advanced Research (CIFAR). K.

ewis and Z. Armstrong are supported by NSERC and the Genomeciences and Technology (GSAT) training program at the Universityf British Columbia.

eferences

ltschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman,D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein databasesearch programs. Nucleic Acids Research 25, 3389–3402.

rmstrong, Z., Reitinger, S., Kantner, T., Withers, S.G., 2010. Enzymatic thioxy-loside synthesis: characterization of thioglycoligase variants identified froma site-saturation mutagenesis library of bacillus circulans xylanase. Chem-BioChem 11, 533–538.

speborg, H., Coutinho, P., Wang, Y., Brumer, H., Henrissat, B., 2012. Evolution, sub-strate specificity and subfamily classification of glycoside hydrolase family 5(GH5). BMC Evolutionary Biology 12, 186.

auer, M., Kube, M., Teeling, H., Richter, M., Lombardot, T., Allers, E., Würdemann,C.A., Quast, C., Kuhl, H., Knaust, F., Woebken, D., Bischof, K., Mussmann, M.,Choudhuri, J.V., Meyer, F., Reinhardt, R., Amann, R.I., Glöckner, F.O., 2006. Wholegenome analysis of the marine Bacteroidetes ‘Gramella forsetii’ reveals adapta-tions to degradation of polymeric organic matter. Environmental Microbiology8, 2201–2213.

eeson, W.T., Phillips, C.M., Cate, J.H.D., Marletta, M.A., 2011. Oxidative cleavage ofcellulose by fungal copper-dependent polysaccharide monooxygenases. Journalof the American Chemical Society 134, 890–892.

ertin, L., Capodicasa, S., Fedi, S., Zannoni, D., Marchetti, L., Fava, F., 2011. Biotrans-

formation of a highly chlorinated PCB mixture in an activated sludge collectedfrom a Membrane Biological Reactor (MBR) subjected to anaerobic digestion.Journal of Hazardous Materials 186, 2060–2067.

rethauer, S., Wyman, C.E., 2010. Review: Continuous hydrolysis and fermentationfor cellulosic ethanol production. Bioresource Technology 101, 4862–4874.

ology 167 (2013) 462– 471

Cantarel, B.L., Coutinho, P.M., Rancurel, C., Bernard, T., Lombard, V., Henrissat, B.,2009. The carbohydrate-active enzymes database (CAZy): an expert resourcefor glycogenomics. Nucleic Acids Research 37, D233–D238.

Eddy, S.R., 1998. Multiple-alignment and -sequence searches. Trends in Biotechnol-ogy 16 (Suppl. 1), 15–18.

Edgar, R., 2010. Search and clustering orders of magnitude faster than BLAST. Bioin-formatics 26, 2460–2461.

Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy andhigh throughput. Nucleic Acids Research 32, 1792–1797.

Ewing, B., Hillier, L., Wendl, M.C., Green, P., 1998. Base-calling of automatedsequencer traces using phred. I. Accuracy assessment. Genome Research 8,175–185.

Feng, Y., Duan, C.J., Pang, H., Mo, X.C., Wu, C.F., Yu, Y., Hu, Y.L., Wei, J., Tang, J.L.,Feng, J.X., 2007. Cloning and identification of novel cellulase genes from uncul-tured microorganisms in rabbit cecum and characterization of the expressedcellulases. Applied Microbiology and Biotechnology 75, 319–328.

Ferrer, M., Golyshina, O.V., Chernikova, T.N., Khachane, A.N., Reyes-Duarte, D., San-tos, V.A.P.M.D., Strompl, C., Elborough, K., Jarvis, G., Neef, A., Yakimov, M.M.,Timmis, K.N., Golyshin, P.N., 2005. Novel hydrolase diversity retrieved from ametagenome library of bovine rumen microflora. Environmental Microbiology7, 1996–2010.

Ferry, J.G., 2010. How to make a living by exhaling methane. Annual Review ofMicrobiology 64, 453–473.

Gómez-Pereira, P.R., Schüler, M., Fuchs, B.M., Bennke, C., Teeling, H., Waldmann,J., Richter, M., Barbe, V., Bataille, E., Glöckner, F.O., Amann, R., 2012. Genomiccontent of uncultured Bacteroidetes from contrasting oceanic provinces in theNorth Atlantic Ocean. Environmental Microbiology 14, 52–66.

Gordon, D., Abajian, C., Green, P., 1998. Consed: a graphical tool for sequence finish-ing. Genome Research 8, 195–202.

Hess, M., Sczyrba, A., Egan, R., Kim, T.-W., Chokhawala, H., Schroth, G., Luo,S., Clark, D.S., Chen, F., Zhang, T., Mackie, R.I., Pennacchio, L.A., Tringe, S.G.,Visel, A., Woyke, T., Wang, Z., Rubin, E.M., 2011. Metagenomic discovery ofbiomass-degrading genes and genomes from cow rumen. Science 331, 463–467.

Holdeman, L.V., Moore, W.E.C., 1974. New genus, coprococcus twelve new species,and emended descriptions of four previously described species of bacteria fromhuman feces. International Journal of Systematic Bacteriology 24, 260–277.

Hyatt, D., Chen, G.-L., LoCascio, P., Land, M., Larimer, F., Hauser, L., 2010. Prodigal:prokaryotic gene recognition and translation initiation site identification. BMCBioinformatics 11, 119.

Ilmberger, N., Meske, D., Juergensen, J., Schulte, M., Barthen, P., Rabausch, U., Angelov,A., Mientus, M., Liebl, W., Schmitz, R., Streit, W., 2012. Metagenomic cellulaseshighly tolerant towards the presence of ionic liquids – linking thermostabilityand halotolerance. Applied Microbiology and Biotechnology 95, 135–146.

Kawaja, J.D.E., Morin, K., Gould, W.D., 2005. A duplicate column study of arsenic,cadmium and zinc treatment in an anaerobic bioreactor based on a systemoperated by Teck Cominco in Trail, British Columbia. In: British Columbia MineReclamation Symposium 2005, Abbotsford, British Columbia, Canada.

Kim, S.J., Lee, C.M., Kim, M.Y., Yeo, Y.S., Yoon, S.H., Kang, H.C., Koo, B.S., 2007.Screening and characterization of an enzyme with beta-glucosidase activ-ity from environmental DNA. Journal of Microbiology and Biotechnology 17,905–912.

Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., Jones, S.J.,Marra, M.A., 2009. Circos: an information aesthetic for comparative genomics.Genome Research 19, 1639–1645.

Lee, S.-M., Jin, L., Kim, J., Han, S., Na, H., Hyeon, T., Koo, Y.-M., Kim, J., Lee, J.-H., 2010.�-Glucosidase coating on polymer nanofibers for improved cellulosic ethanolproduction. Bioprocess and Biosystems Engineering 33, 141–147.

Liles, M.R., Manske, B.F., Bintrim, S.B., Handelsman, J., Goodman, R.M., 2003. A censusof rRNA genes and linked genomic sequences within a soil metagenomic library.Applied and Environmental Microbiology 69, 2684–2691.

Logan, M.V., Reardon, K.F., Figueroa, L.A., McLain, J.E.T., Ahmann, D.M., 2005. Micro-bial community activities during establishment, performance, and decline ofbench-scale passive treatment systems for mine drainage. Water Research 39,4537–4551.

Martinez, A., Tyson, G.W., DeLong, E.F., 2010. Widespread known and novelphosphonate utilization pathways in marine bacteria revealed by func-tional screening and metagenomic analyses. Environmental Microbiology 12,222–238.

Mattes, A., Evans, L.J., Douglas Gould, W., Duncan, W.F.A., Glasauer, S., 2011. The longterm operation of a biologically based treatment system that removes As S andZn from industrial (smelter operation) landfill seepage. Applied Geochemistry26, 1886–1896.

Mewis, K., Taupp, M., Hallam, S.J., 2011. A high throughput screen for biominingcellulase activity from metagenomic libraries. Journal of Visualized Experiments48, 2461.

Nacke, H., Engelhaupt, M., Brady, S., Fischer, C., Tautzt, J., Daniel, R., 2012. Iden-tification and characterization of novel cellulolytic and hemicellulolytic genesand enzymes derived from German grassland soil metagenomes. BiotechnologyLetters 34, 663–675.

Pang, H., Zhang, P., Duan, C.-J., Mo, X.-C., Tang, J.-L., Feng, J.-X., 2009. Identifica-tion of cellulase genes from the metagenomes of compost soils and functional

characterization of one novel endoglucanase. Current Microbiology 58, 404–408.

Pati, A., Gronow, S., Zeytun, A., Lapidus, A., Nolan, M., Hammon, N., Deshpande, S.,Cheng, J.-F., Tapia, R., Han, C., Goodwin, L., Pitluck, S., Liolios, K., Pagani, I., Ivanova,N., Mavromatis, K., Chen, A., Palaniappan, K., Land, M., Hauser, L., Chang, Y.-J.,Jeffries, C.D., Detter, J.C., Brambilla, E., Rohde, M., Göker, M., Woyke, T., Bristow,

otechn

P

P

R

S

S

T

Xia, Y., Ju, F., Fang, H.H.P., Zhang, T., 2013. Mining of novel thermo-stable cellulolyticgenes from a thermophilic cellulose-degrading consortium by metagenomics.

K. Mewis et al. / Journal of Bi

J., Eisen, J.A., Markowitz, V., Hugenholtz, P., Kyrpides, N.C., Klenk, H.-P., Lucas,S., 2011. Complete genome sequence of Bacteroides helcogenes type strain (P36-108 T). Standards in Genomic Sciences 4, 45–53.

ereira, J.H., Chen, Z., McAndrew, R.P., Sapra, R., Chhabra, S.R., Sale, K.L., Simmons,B.A., Adams, P.D., 2010. Biochemical characterization and crystal structure ofendoglucanase Cel5A from the hyperthermophilic Thermotoga maritima. Journalof Structural Biology 172, 372–379.

ope, P.B., Denman, S.E., Jones, M., Tringe, S.G., Barry, K., Malfatti, S.A., McHardy, A.C.,Cheng, J.-F., Hugenholtz, P., McSweeney, C.S., Morrison, M., 2010. Adaptation toherbivory by the Tammar wallaby includes bacterial and glycoside hydrolaseprofiles different from other herbivores. Proceedings of the National Academyof Sciences 107, 14793–14798.

astogi, G., Muppidi, G., Gurram, R., Adhikari, A., Bischoff, K., Hughes, S., Apel,W., Bang, S., Dixon, D., Sani, R., 2009. Isolation and characterization ofcellulose-degrading bacteria from the deep subsurface of the Homestake goldmine, Lead, South Dakota, USA. Journal of Industrial Microbiology and Biotech-nology 36, 585–598.

impson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, I., 2009.ABySS: a parallel assembler for short read sequence data. Genome Research19, 1117–1123.

tamatakis, A., 2006. RAxML-VI-HPC: maximum likelihood-based phylogeneticanalyses with thousands of taxa and mixed models. Bioinformatics 22,2688–2690.

aupp, M., Lee, S., Hawley, A., Yang, J., Hallam, S.J., 2009. Large insert environmentalgenomic library production. Journal of Visualized Experiments 31, 1387.

ology 167 (2013) 462– 471 471

Taupp, M., Mewis, K., Hallam, S.J., 2011. The art and design of functional metage-nomic screens. Current Opinion in Biotechnology 22, 465–472.

Tringe, S.G., Rubin, E.M., 2005. Metagenomics: DNA sequencing of environmentalsamples. Nature Reviews Genetics 6, 805–814.

Vocadlo, D.J., Wicki, J., Rupitz, K., Withers, S.G., 2002. Mechanism of thermoanaer-obacterium saccharolyticum �-xylosidase: kinetic studies. Biochemistry 41,9727–9735.

Voget, S., Steele, H.L., Streit, W.R., 2006. Characterization of a metagenome-derivedhalotolerant cellulase. Journal of Biotechnology 126, 26–36.

Warnecke, F., Luginbuhl, P., Ivanova, N., Ghassemian, M., Richardson, T.H., Stege,J.T., Cayouette, M., McHardy, A.C., Djordjevic, G., Aboushadi, N., Sorek, R., Tringe,S.G., Podar, M., Martin, H.G., Kunin, V., Dalevi, D., Madejska, J., Kirton, E., Platt,D., Szeto, E., Salamov, A., Barry, K., Mikhailova, N., Kyrpides, N.C., Matson,E.G., Ottesen, E.A., Zhang, X., Hernandez, M., Murillo, C., Acosta, L.G., Rigout-sos, I., Tamayo, G., Green, B.D., Chang, C., Rubin, E.M., Mathur, E.J., Robertson,D.E., Hugenholtz, P., Leadbetter, J.R., 2007. Metagenomic and functional anal-ysis of hindgut microbiota of a wood-feeding higher termite. Nature 450,560–565.

PLoS ONE 8, e53779.Youssef, N.H., Elshahed, M.S., 2008. Diversity rankings among bacterial lineages in

soil. ISME Journal 3, 305–313.


Recommended