Current Level

 Lecture 2.1
 Lab 2.1
 Lecture 2.2
 Lab 2.2
 Lecture 2.3
 Lab 2.3
 Take-home 2

Previous Level

 BioWeb Home
 Unit 1
 Unit 2
 Unit 3
 Unit 4
 Temp
 Genetics Ex
 Lab 2.2

    Multiple Sequence Alignments

     

    1.     Go to http://www.ebi.ac.uk/Tools/msa/

        This is from the EMBL site of the European Bioinformatics Institute and it contains a number of multiple sequence alignment tools.  Some are only for aligning protein sequences, but most do nucleic acid sequences as well.  We are going to compare the results of several MSA for our dataset. 

    2.        Select “Launch MUSCLE’” from the list of programs.

    a.       Upload your file of sequences, or alternately paste them into the box with a line between each sequence and each with a header of the form: >SeqName/ID

    b.      Under Step 2, in order to see the alignment you must change the default parameter from Pearon/FASTA as this output shows each sequence individually with the introduced gaps from the alignment (a file used as input by some phylogenetic programs).  Change the output format default to “Phylip interleaved” and hit  submit. For a dataset this size the task will generally complete in only a couple of minutes and so the e-mail alert isn’t necessary.  Once you get the results, download them as we will need them for lab 2.3. 

    c.       Rerun this program using instead the output format default “HTML”, although “Phylip interleaved” and the remaining formats will also provide you with an alignment view, HTML provides some shading for conserved residues (strongly conserved is blue and weakly conserved is gray) for easier viewing.  There is, however, no consensus line. There are programs you can download such as Boxshade and Textshade that will shade the residues (nt or amino acids) different colors based on consensus levels you can set. They provide many different viewing options that some people prefer.

    d.      While waiting for this to run, open a new window (not tab) to this site for step 3 below.  

    3.      Let’s compare the MUSCLE alignment to one we get using a different alignment tool.  Select “Launch ClustalW2”, a slightly newer version of the popular CLUSTALW.

    a.       Under Step 1, change the default from Protein to DNA and then enter your sequence file as above in 2a.

    b.      Under Step 2, leave the type at the default “Slow” unless the program bogs down in class, in which case we may switch to “Fast”.  Typically “Slow” will give better results and so “Fast” is only used for much larger datasets where the wait time increases dramatically.  Also leave the defaults for Step 3 for now.  [Time permitting try rerunning the alignment with a different weight matrix and/or different gap penalties to see the effect on the alignment.]

    c.       When the results pop up scroll through them – the numbers at the end of each line refer to the nucleotide number (gaps are NOT counted) for that sequence – when I refer to E. coli numbering this would be a good place to determine that.  The asterisks at the bottom of each group of aligned sequence show where there is 100% consensus.  Note the “Show Colors” button – this is actually based on protein sequences, and so will be more effective for these alignments, but does perhaps help see some areas of consensus within the DNA alignment.

    e.       For a better color representation, on this site if you select the “Download Alignment File” on your results page (note, doesn’t work with all browsers) you will get a file that you can upload into MView, described in 4 below, that provides some color contrast.

    4.        Select “Launch MView” from the list of programs– note this is simply a viewing tool and not an editor. (You may want to do this in a third window.)

    a.       Under Step 1, change the default from Protein to DNA

    b.      Upload your downloaded file.

    c.       Either leave the Step 2 input format at “Automatic” or change it to appropriate format. Leave the step 3 output with the default parameters for now, but feel free to experiment with the different options.  Unfortunately this viewer sets your first sequence as a reference and so the shading is based on whether each residue is the same or different from the reference.  The alignment is numbered, however, and it has 4 different lines of consensus sequence (100% all the sequences has the same residue at the site, whereas at 70% only 7 out of 10 sequences have that given residue. R stands for purine and Y for pyrimidine.)  Why do you suppose they provide 4 different levels of consensus?

     

    5.      Analyze your alignments.

     

             a.        How do the two alignments (MUSCLE and CLUSTALW2) compare?  Give the numbers (with respect to the E. coli numbering) for 3-4 areas where they differ and briefly describe the difference.   

    b.      Within a given alignment, do the sequences start and end in the same place?  Why do you suppose this is?  Do you think this affects your alignments?

    c.       Scanning your alignments, you should see both variable and conserved regions. Why are both of these features important?

    d.      The region between 1300 and 1400 (E. coli numbering) contains an area of signature sequence that is considered universal.  Find it and write down at least 10 nt from this conserved region (assume N's are likely conserved nt).

    e.       Give the numbers (from the consensus sequence) for a couple of regions (size doesn't matter) where Eukarya and Archaea (Methanococcus and Pyrodictium) have sequence in common but the Bacterial sequences (E. coli and M. scandinavica) are different?  Give the numbers for a couple of regions the Archaea and Bacteria share in common?  Likewise for Eukarya and Bacteria?  Was the last one harder to find?  Why do you suppose that's true?

     6.     Try at least one more alignment from MAFFT or T-Coffee.  Is this alignment the same or different from the other two?  Looking them over do you have a preference for any of the formats?  If yes, why?  What other information would you need to determine which is the best alignment?       

     

     

 

 

 

uwsa_l5

 © 2002 The Board of Regents of the University of Wisconsin System.

Click here to email comments to Scott Cooper regarding this site or its links.