Lab 2.2

Current Level

Previous Level

Multiple Sequence Alignments

Multiple Sequence Alignments – Lab 2.2

1. We will be using programs housed on the Mobyle portal from the Pasteur Institute. Note: you can use this portal as a guest, but your work will only be stored for a limited time, results will be e-mailed to you, but you will not be able to use them from another place unless you save and upload them. Registration for the site is free and allows you to store your bookmarked data and results on the server so they are accessible from any computer as long as you remember your password! If you want to register or are registered go to http://mobyle.pasteur.fr/cgi-bin/portal.py to sign in.

2. On the left had side under “programs” click on the box in front of “alignment“ from the list of programs to expand that topic. Then click on “multiple” and finally “clustalw-multialign”.

a. Upload your file of sequences, or alternately paste them into the box with a line between each sequence and each with a header of the form: >SeqName/ID. The file will work best if it is in a plain text format, such as text editor or notepad. If you have a word or some other file format you can upload it onto the Convert Files website (http://www.convertfiles.com/) and select the input format and .txt for output to get it in the proper. Sometimes it is necessary to edit the names of your sequences (some of the programs cut-off the names at 10 characters or at the first space and this can lead to confusion is 1 or more names begin with identical terms) – this can either be done in the text editor or can be done in the alignment window once the file is uploaded. Once you have uploaded the file to the portal, if you are registered, you should be able to select this file from the dropdown menu to the right of the “Choose file” button

b. If you want to change the default parameters you must click on the advanced options box. We will discuss these options in class. [Time permitting try rerunning the alignment with a different weight matrix and/or different gap penalties to see the effect on the alignment.]

c. Run the program by clicking on “Run”. If you want to print the alignment (like for your take-home) you can select “download” from the options above the results. You could also choose “back to form” if you wanted to change the parameters. Under the alignment file box you can click on “full screen” for easier viewing.

d. Click on bookmark to save this alignment for future use.

e. Although this alignment is nice “as is”, finding consensus areas (possible signature sequences) can be even easier with coloring or shading. This is easier done as below the alignment file box there is a drop down box next to “further analysis”. This will use the output of the Clustal alignment as the input for you next process. Select “boxshade” and change the parameters to give you a ruler, a consensus line (probably want to change this cut-off) and your desired color scheme for viewing. Perhaps try several to figure out what works best for you.

3. Let’s compare the CLUSTALW alignment to one we get using a different alignment tool, MUSCLE. (You may want to do this is a new window for an easier comparison with CLUSTALW.) Go back to the list of programs and select “muscle”.

a. Click on “upload” and now you should be able to select the original sequence .txt file from the dropdown menu to the right of the “Choose file” button. Hit select to load the file.

b. Again, if you want to change the default parameters you must click on the advanced options box. We will discuss these options in class. One default you must change is for output as the “fasta” default shows the sequences individually and not in columns like is typical for alignments. Changing it to “muscle” or “phylip” work.

c. Click on bookmark to save this alignment for future use.

d. You can use the drop down box below the alignment file box next to “further analysis” just like you did above.

4. Analyze your alignments.

a. How do the two alignments (MUSCLE and CLUSTALW) compare? Give the approximate numbers (with respect to the E. coli numbering) for 3-4 areas where they differ (if they do) and briefly describe the difference.

b. Within a given alignment, do the sequences start and end in the same place? Why do you suppose this is? Do you think this affects your alignments?

c. Scanning your alignments, you should see both variable and conserved regions. Why are both of these features important?

d. The region between 1300 and 1400 (E. coli numbering) contains an area of signature sequence that is considered universal. Find it and write down at least 10 nt from this conserved region (assume N's are likely conserved nt).

e. Give the numbers (from the consensus sequence) for a couple of regions (size doesn't matter) where Eukarya and Archaea (Methanococcus and Pyrodictium) have sequence in common but the Bacterial sequences (E. coli and M. scandinavica) are different? Give the numbers for a couple of regions the Archaea and Bacteria share in common? Likewise for Eukarya and Bacteria? Was the last one harder to find? Why do you suppose that's true?

5. Try at least one more alignment from MAFFT (a Fast Fourier Transform method) or DIALIGN (a block-based method). Is this alignment the same or different from the other two? Looking them over do you have a preference for any of the 3 or 4 formats you tried? If yes, why? What other information could you use to determine which is the best alignment?

Click here to email comments to Scott Cooper regarding this site or its links.