+ All Categories
Home > Documents > A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … ·...

A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … ·...

Date post: 23-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
17
Spring 2017 BIOL 312: Microbiology A Town on Fire Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire Instructor: Dr. Tammy Tobin Susquehanna University E-Mail: [email protected]@susqu [email protected] Overview In 1962, a surface trash fire ignited an anthracite coal seam in an abandoned strip mine in Centralia, Pennsylvania. Repeated efforts to extinguish the fire failed, and in 1984 Congress responded to the resulting high carbon monoxide levels and frequent land collapses by allocating more than $42 million for relocation efforts. Most of the residents have long since moved, and their homes have been demolished, leaving behind a ghost town where a coal mining community once thrived (Fig 1). Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 1
Transcript
Page 1: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

Spring 2017BIOL 312: Microbiology

A Town on FireMetagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire

Instructor: Dr. Tammy Tobin Susquehanna UniversityE-Mail: [email protected]

OverviewIn 1962, a surface trash fire ignited an anthracite coal seam in an abandoned strip mine in Centralia, Pennsylvania. Repeated efforts to extinguish the fire failed, and in 1984 Congress responded to the resulting high carbon monoxide levels and frequent land collapses by allocating more than $42 million for relocation efforts. Most of the residents have long since moved, and their homes have been demolished, leaving behind

a ghost town where a coal mining community once thrived (Fig 1).

Figure 1: Above: Centralia, PA prior to the evacuation in 1984. The town had over 1800 residents, several businesses and churches. Right: Old Route 61 through Centralia (taken in 1997) showing steam, rich in carbon monoxide, venting upward through cracks caused by land collapses.

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 1

Page 2: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

As a result of this mine fire, surface soil temperatures in affected areas regularly exceed 60°C and soils surrounding the vents are often rich in combustion products such as sulfur and nitrogen that microbial communities can use and transform as a part of their energy-generating processes.

In this case study, you will use information in papers that describe typical geothermal soils and their microbial communities to hypothesize a single bacterial genus that you would expect to find living in Centralia’s fire-affected soils. You will use metagenomic analysis to test your hypothesis and then make a presentation that reports your findings and predicts the types of impacts that members of your genus might be having on the Centralia ecosystem.

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 2

Page 3: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

Learning Objectives:As a result of participation in these activities, students will be able to:

1. Explain each step in the generation and analysis of pyrosequencing-based metagenomic 16S rRNA sequence data.

2. Discuss the basic evolutionary assumptions that underlie metagenomic sequence analysis.

3. Evaluate the strengths and weaknesses of the methods employed in the metagenomic analysis of microbial populations, including the impact that data quality has on bioinformatics analyses.

4. Choose and justify the appropriate methods for metagenomics analysis of bacterial community structure.

5. Propose valid hypotheses regarding bacterial community structure, and use bioinformatics to test those hypotheses.

EvaluationThe final evaluation of this project will be based on the successful completion of Team Application Activities and a Final Paper.

Figure 2: Steam from “Anthracite Smokers” in Centralia, PA carries dissolved combustion products, such as nitrogen and sulfur, to the surface through soil fractures. As the steam rises it cools and precipitates chemicals into the surrounding soils where they can be utilized and transformed by nitrogen and sulfur-cycling bacterial communities.

MaterialsRecommended Readings:A Primer on Metagenomics

Computer Resources:Macintosh computer running Java version 7 or higher.

Access to Amazon Web Services EC2 medium or large instance.

FigTree

Metagenomics Sequence Resources:Centralia Metagenomics files Cen 37, Cen95 and Cen125 are available through the GCAT-SEEK consortium at http://lycofs01.lycoming.edu/~gcat-seek/index.html

Team Application Activities:IntroductionStudents learn about the history and biogeochemistry of the Centralia Mine Fire environment and will take the GCAT SEEK pre-test.

Team Activity #1Students work in teams in order to familiarize themselves with LINUX and QIIME.

Team Activity #2Students propose hypotheses regarding the types of microbial species they expect to see in thermophilic versus mesophilic soils in Centralia and use QIIME to test their hypotheses.

Team Activity #3Students use QIIME and FigTree to perform alpha diversity and phylogenetic analyses and begin to prepare their presentations.

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 3

Page 4: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

Final PresentationEach student team presents their metagenomic findings.

Team Application Activity #1: An Introduction to Next Generation Sequencing, Metagenomic Analysis, LINUX and QIIMENext Generation Sequencing and Pyrosequencing“Next generation (Next Gen) sequencing” is a term that encompasses a variety of DNA sequencing technologies, all of which have a common core approach: they use DNA polymerase to generate thousands or millions of relatively short (compared to traditional sequencing technologies) sequences of a DNA template concurrently. Thus, these sequencing technologies are often referred to as being ‘massively parallel’. They then differ in the manner in which they determine when (and which) base is added to the replicating DNA (that is, in how they actually “read” the sequence). For example, Ion Torrent sequencing uses the tiny pH change that happens each time a new phosphodiester bond is created to determine whether or not a particular base was added.

The data that we will be using was generated using a technology called ‘pyrosequencing’ (Figure 3). In this study, DNA was isolated directly from soil samples using a MoBio Powersoil DNA Isolation Kit. Next, a metagenomic 16S rRNA gene library was generated using PCR to amplify a portion of that gene from every bacterial species present in the soil DNA sample. Short adaptors (shown as red and green bars in the figure) were then ligated onto the ends of the PCR-generated fragments. The first adaptor was used to attach the DNA fragments onto streptavidin-coated beads. The second primer was used for amplification and sequencing of the fragments. The DNA library was then treated to make it single-stranded, and immobilized onto the beads at a dilution that ensured that each bead contained only a single, unique DNA fragment (Step 4)

The bead-bound library was emulsified with PCR reagents in a water-in-oil mixture. Each bead was captured within its own microreactor where PCR

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 4

Figure 3: Pyrosequencing

Page 5: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

amplification occurs. This resulted in approximately 10 million copies of a single sequence (Step 5) attached to each bead.

The beads, containing the amplified, single-stranded, template DNA library, were then added to individual wells of a PicoTiterPlate (Step 6) that contained the DNA polymerase, sulfurylase (APS) and luciferase enzymes. The latter two enzymes will make a flash of light if DNA polymerase successfully adds a base to the growing end of a daughter DNA strand during the sequencing reactions. (Step 7).

The loaded PicoTiterPlate device was placed into the sequencer, which flooded all of the wells (each well, as you remember, has a different DNA fragment in it) with sequencing reagents containing buffers, primers and one of the bases. Let’s say G was added to all of the wells first. Since each well has a unique piece of DNA, the G will be complimentary to the first base of the template DNA in some (but not all) of the wells. Thus, DNA polymerase will only add it to the growing daughter strand in those (complementary) wells. Multiple G’s will be added at this time if the template strand has more than one C in a row (e.g. CCC in positions 1, 2 and 3). Addition of one or more nucleotide(s) generates a flash of light, as previously described. The signal strength of the flash is proportional to the number of nucleotides added, so a GGG sequence will have a light signal three times as bright as a single G. If the base that is added is not complimentary to the template strand no light will be generated.

When an entire plate is flooded with the sequencing reagents in this manner, some of the wells will glow and some will not. The sequencer can detect the light flashes, and will record which of the wells incorporated a G. The wells are then washed, the next base is added (either A, T or C) and the whole process is repeated, sequentially. After each addition, the sequencer ‘reads’ which wells incorporated the new base. This process is then repeated many times, ultimately generating short (up to several hundred base pair) sequences of all of the unique fragments in all of the wells at the same time…massively parallel, indeed!

Metagenomic Analysis of Bacterial 16S rRNA genes As you probably recall from your introductory biology classes, protein synthesis (translation) is catalyzed by a structure called the ribosome, a complex structure composed of both proteins and ribosomal RNA (rRNA). Translation begins when the small subunit of the ribosome locates and binds to the 5’ end of the mRNA. Once this has happened, the large ribosomal subunit can attach, and the complete ribosome translocates along the mRNA, catalyzing the formation of peptide bonds between amino acids as they are conveyed to the correct mRNA codon by tRNA.

In order for this process to work correctly, the ribosome must first be able to find the 5’ end of an mRNA and bind to it. That recognition is the job of the 16S rRNA, which contains a 3’ sequence that is complementary to the 5’ end of the mRNA. The 16S rRNA genes have been used extensively for phylogenetic analysis since Carl Woese and George Fox first proposed their use in 1977 (Woese and Fox, 1977). Because of its critical function in translation, the 16S rRNA gene contains highly conserved sequences that can be used to design ‘universal’ PCR primers (primers that work for almost all species), as well as highly variable regions that allow taxon identification based on sequence comparison to known taxa. Well-curated databases, such as the Ribosomal Database Project (Cole, et al. 2003) and the Greengenes database (DeSantis et al 2006) contain regularly updated versions of all of the known 16S rRNA gene sequences, along with their phylogenetic assignments, and are invaluable in this process.

Metagenomic analysis (Handelsman, et al. 1998) is the analysis of genetic samples recovered directly from the environment, without any attempt to isolate the microbes from which they came. This type of analysis allows microbiologists to study the vast numbers of uncultured, or unculturable, microbes

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 5

Page 6: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

in any environment. Current high-profile examples of this type of analysis include the Human Microbiome project, the Earth Microbiome project and the Maternal Microbiome project.

In preparation for this case study, soil was collected from 3 boreholes in Centralia, PA (37°C, 52°C and 60°C), and genomic DNA was directly isolated from the samples using the MoBio Powersoil Kit. PCR with universal bacterial 16S rRNA primers was then used to make copies of all of the bacterial 16S rRNA genes in each of these samples. These PCR products were then used as the template for Roche 454 pyrosequencing at the Penn State University genomics lab. You will be using this data to test hypotheses regarding the types of bacteria that live in the hot soils overlying the Centralia, PA mine fire. But first, you must learn a bit about the program that you will be using to perform the analyses.

An Introduction to QIIME(Adapted from Regina Lamella, GCAT-SEEK Metagenomics Workshop,

Summer 2013)

Quantitative Insights into Microbial Ecology (QIIME) is an open source pipeline that runs in a LINUX environment. It can be used to process next generation sequencing data in a variety of ways that range from making sure that all of your sequences are of high enough quality to be used (quality trimming), to performing a whole suite of phylogenetic and statistical analyses on the quality trimmed data. We will be utilizing many of these functions in this case study, but first you must get used to working in the LINUX environment using the Mac Terminal, which is part of the operating systems on all Macs. We will be using the terminal to access QIIME on an Amazon Web Services EC2 server. The Linux and QIIME tutorials that follow are largely the work of Dr. Regina Lamandella at Juniata College. I have tweaked them a bit to be appropriate for our operating system and case study.

Unix/Linux Tutorial Linux is an open-source Unix-like operating system. It allows the user considerable flexibility and control over the computer by command line interaction. Many bioinformatics pipelines are built for the Unix/Linux environment; therefore it is a good idea to become familiar with Linux basics before beginning bioinformatics.

Every desktop computer uses an operating system. The most popular operating systems in use today are Windows, Mac OS, and UNIX. Linux is an operating system very much like UNIX, and it has become very popular over the last several years. Operating systems are computer programs. An operating system is the first piece of software that the computer executes when you turn the machine on. The operating system loads itself into memory and begins managing the resources available on the computer. It then provides those resources to other applications that the user wants to execute.

The shell- The shell acts as an interface between the user and the kernel. When a user logs in, the login program checks the username and password, and then starts another program called the shell. The shell is a command line interpreter (CLI). It interprets the commands the user types in and arranges for them to be carried out. The commands are themselves programs: when they terminate, the shell gives the user another prompt to let the user know that the program has finished ($ on our systems).

Useful LINUX shortcuts:

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 6

Page 7: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

Filename Completion - By typing part of the name of a command, filename or directory and pressing the [Tab] key, the shell will complete the rest of the name automatically. If the shell finds more than one name beginning with those letters you have typed, it will pause, prompting you to type a few more letters before pressing the tab key again.

History - The shell keeps a list of the commands you have typed in. If you need to repeat a command, use the cursor keys to scroll up and down the list or type “history” for a list of previous commands.

Files and ProcessesEverything in UNIX is either a file or a process.A process is an executing program identified by a unique process identifier. A file is a collection of data. They are created by users using text editors, running compilers etc.Examples of files:

A document (report, essay etc.) The Centralia metagenomic sequences and related files A directory, containing information about its contents, which may be a mixture of other

directories (subdirectories) and ordinary files. For example, your Centralia_Case_Study folder is a directory.

It is not required to have a Linux operating system to use QIIME. We will be running the Linux environment through the Mac Terminal. So first let’s get started.

Team Application Activity # 5: Practicing with LINUX and MacQIIME

Names of Team Members:

Part One: Getting your Files Ready.

Step One: Downloading the required files to your desktop.

1. Get into pairs and get one Mac laptop per pair of students. Create a new Desktop Folder entitled Centralia_Case_Study Your folder must have exactly that name or subsequent commands in the case study will not work.

2. Log into Blackboard and go to our course web site. Under contents you should see a new folder entitled “Centralia Case Study Materials.” Download catqual, centralia_mapping.txt and catfna to your Centralia_Case_Study folder.

3. Download QIIME-1.pem to your desktop, but NOT into the Centralia_Case_Study folder.

*Step Two: Upload your Centralia_Case_Study folder to your EC2 instance using Cyberduck. This has been done for you. I simply include the instructions in case we need to redo it!*

1. Start Cyberduck. It is in your applications folder. Cyberduck allows you to transfer files between your computer and the Amazon EC2 instance you are running.

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 7

Page 8: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

2. Click on “Open Connection” in the upper left of the window, and choose SFTP from the top drop-down menu.

3. Type the Public DNS (IPv4) address for your EC2 instance (provided by your instructor) into the Server window in Cyberduck.

4. Type ubuntu (exactly as written!) as your username. You do not need a password.

5. Click on ‘Use Public Key Authentication’ at the bottom of the Cyberduck window and then migrate to and choose your key pair (it should be on your desktop as QIIME-1.pem). Your Cyberduck window should now look something like this:

6. If it does, click on “Connect”. Then click on ‘allow’. A new window will open up in Cyberduck when it has connected to the server. To copy all of your Centralia files to your EC2 instance, simply drag and drop your Centralia_Case_Study folder into this window. Click on ‘Allow’. Copying these files will take a while, so we will now move on to practicing with

Linux.

Part Three: Practicing with the LINUX environment worksheet

Names:

Instructions: Follow each of the steps below, and answer the questions in the spaces provided. Please note that spelling, spaces, cases, etc. are absolutely critical in LINUX. You can always type his (for history) to see what you typed if you get an error message…..

4. Double click on the Terminal program icon (located in Applications – Utilities) to open it. You should see your user name followed by a dollar sign (username $). Every time you see a $, it is a prompt that is telling you that it is ready for the next programming command.

5. Understanding the file structure and knowing how to use some basic Linux commands are essential for using QIIME effectively. Below is a simplified version of the file structure of an example distribution of Linux.

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 8

Page 9: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

6. The file structure is important when we use the command line, since we need to tell the shell where to find certain files, or where to output the results. The full path to qiime in this example would be /home/qiime.

7. In the space below, answer the following question: If you want to work in the Shared_Folder directory, what is the full path that you would type to get there?

8. You can always determine which directory you are working in by typing pwd. Type it now. What do you see? That is your home directory. Write it down in the space below.

9. If you want to list the contents of a directory, you use the command ls (list). Which files and directories do you see in your home directory? List a few.

10. In order to change directories, you will use the command cd. Navigate to the Desktop directory (it should have shown up when you typed ls) by typing in cd Desktop. The command line should now indicate that you are in the Desktop directory.

11. List a few of the contents of your Desktop.

12. What command did you just use to get that information?

13. Go to the Centralia_Case_Study directory. What command did you use to do that?

14. List the files and folders that you see in the Centralia_Case_Study directory:

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 9

Page 10: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

15. You can go up one level in your directories by typing cd .. (note there is a space between the cd and the two periods). Go ahead and do that. Which directory are you in now?

16. cd back to the Centralia_Case_Study directory, this time trying the filename completion

trick. Type cd Cen and then hit tab. The terminal should autofill the rest of your folder name. Hit return to change to that directory, then cd back to the Desktop.

Part Four: THE QIIME Workflow Worksheet

Names:

1. Double-click on the Centralia_Case_Study folder on your Desktop to open it.2. In the Centralia_Case_Study folder, you should see three three files: catfna,

catqual and mapping_centralia.txt. 3. Double-click on the mapping file to open it. It will contain a chart (similar to the

one shown below) that contains a whole bunch of information about each metagenomic sample.

Sample ID: A name that I have given to each borehole sample, in this case S1, S2 and S3

Barcode Sequence: A short DNA sequence that is added to the 5’end of every PCR fragment in a sample. So, every PCR fragment from S1 will start with the sequence ACGAGTGCGTA. This will let the QIIME program sort the sequences based on which borehole they came from in later analyses.

Linker Primer Sequence/Primer: This is the sequence of the PCR primer that was

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 10

Page 11: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

used to sequence the 16S rRNA genes. In each sequence, it will be found immediately after the barcode sequence. 8F is the name of the primer used.

Sulfate, Sulfur, Ammonia, Nitrate, pH, Temp: Chemical analyses were performed on each of the borehole samples, and these are the results, in parts per million (ppm). If you look at sample S1, you will see that it has 250 ppm sulfate and came from the 52°C borehole.

File name and description: Sample identifiers that are needed by QIIME, but are not relevant to your analyses.

What is the barcode sequence for the 60°C sample? What is its ammonia content in ppm?

4. Catfna is a file that contains all of the metagenomic sequence data from all three study sites (that is why the barcodes are needed!) in a format called “FASTA”. Go ahead and double click on that file in your desktop folder and take a look. If the computer does not open the file right away, choose ‘Open with’ and Text Edit (found in Applications). You should see thousands of sequences that look like this:

>HD4AU5D04IK2EL#AGACGCACTCAAGACGCACTCAGAGTTTGATCATGGCTCAGAATCAAACGCTGGCGGCGCGCTTAACACATGC

The first line contains a unique sequence sample ID (assigned by the sequencer), followed by the sample barcode. In this case the barcode is AGACGCACTCA, which lets QIIME know that this sample came from S2 (the 37°C borehole – see the chart above). The second line is the sequence of the metagenomic sample itself, starting with the barcode, and then the 8F primer, and then the 16SrRNA sequence itself. As you can see, it is quite short. The length of these sequences can vary quite a bit, ranging from just a few bases to a few hundred.

On the sequence above, underline the barcode sequence and circle the primer sequence (you can find that in your mapping file!).

5. Catqual is a file that contains a numerical quality score, called the Phred score (Q) for each of the bases in the sequence. The formula for Q is shown below (P is the probability of a base calling error):

So, a Phred score of 10 means that there is a 1/10 chance of an incorrect base call (90% accuracy) and a Phred score of 40 means that there is a 1/10,000 chance of an incorrect base call. If a sequence does not have adequate quality scores (we will use 20, or 99% accuracy, as the cut-off), QIIME will filter it out of the downstream analyses.

6. Open the catqual file in your Centralia_Case_Study desktop folder with Text Edit. Compare the third sequence (HD4AU5D04IXVDY) with the fourth sequence (HD4AU5D04IXVNZ). Based on their Phred scores, which sequence do you think is the most reliable? Which is the least reliable? Justify your answers.

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 11

Page 12: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

7. If you were to go ahead to use sequences with low Phred scores, how do you think that would impact your final phylogenetic analysis?

Part Five: Using QIIME to quality trim the sequence files and to split them into their respective sample site categories.

1. Go back to the terminal and make sure you are in the Desktop directory. In order to use your EC2 instance, you must first set the permissions for your QIIME-1.pem key. Simply type the following command into your terminal window exactly as shown and hit return.

chmod 400 QIIME-1.pem

2. Next, you will need to connect to your EC2 instance via the terminal. Type the following command:

ssh -i “QIIME-1.pem” ubuntu@<the address provided by your instructor> and hit return.

Your command should look something like this:ssh -i “QIIME-1.pem” ubuntu@ ec2-54-173-47-47.compute-1.amazonaws.com

3. When you get the warning text type yes to tell the computer it is ok to proceed.

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 12

Page 13: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

4. You should now be in the home directory of your EC2 instance. List the contents of your EC2 instance (ls). Hopefully you will see Centralia_Case_Study there. If not, we will need to go to step 2.

5. cd to the Centralia_Case_Study folder. You will do all of your analyses from there. List the contents to verify that you have the catqual, catfna and mapping_centralia.txt files correctly uploaded.

6. We will now ask QIIME to quality trim the sequence files (those below 20 will be discarded at this step) and to sort the sequences, which are all mixed together at this point, into groups by sample location (S1, S2 or S3, in this case). The command for this is:

split_libraries.py.

However, in order for QIIME to perform this analysis, you will need to provide it with other information, as well. Specifically, you need to tell it what your input file is (catfna), what your mapping file is (mapping_centralia.txt), what your quality file is (catqual), how many bases your barcodes have, if they are different from the default value of 12 (we have 11 bases), and, finally, what to call your output file. We will use split_library_centralia_output.

The overall command goes as follows: split_libraries.py -m mapping_centralia.txt -f catfna -q catqual -b 11 -o split_library_centralia_output

-m designates a mapping file-f designates the input FASTA file-q designates the input quality file (Phred scores)-b defines the length of the barcode sequence in base pairs-o designates the output file

Type the command above exactly as it is written…making sure that spaces, underscores, etc. are all correct, then hit return. When QIIME has finished its work, you will see the $ prompt again.

7. List the contents of the Centralia_Case_Study directory again. If everything worked, you should see a new folder. What is the name of that folder?

8. cd to that folder and list its contents…what three files are there?

9. Congratulations: you have just completed the first step of this QIIME workflow! During the next class you will do the rest of the analyses. Don’t worry…it is simpler than it looks!

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 13

Page 14: A Town on Fire - Susquehanna Universitycomenius.susqu.edu/biol/312/centralia metagenomics … · Web viewA Primer on Metagenomics Computer Resources: Macintosh computer running Java

Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 14


Recommended