Multiple Sequence Alignments
Multiple Sequence Alignments – Lab 2.2
1.
We will be using programs
housed on the Mobyle portal from the Pasteur Institute.
Note: you can use this portal as a guest, but your work
will only be stored for a limited time, results will be e-mailed
to you, but you will not be able to use them from another place
unless you save and upload them.
Registration for the site is free and allows you to store
your bookmarked data and results on the server so they are
accessible from any computer as long as you remember your
password! If you
want to register or are registered go to
http://mobyle.pasteur.fr/cgi-bin/portal.py
to sign in.
2.
On the left had side
under “programs” click on the box in front of “alignment“ from
the list of programs to expand that topic. Then click on
“multiple” and finally “clustalw-multialign”.
a.
Upload your file of sequences, or alternately paste them into
the box with a line between each sequence and each with a header
of the form: >SeqName/ID. The file will work best if it is in a
plain text format, such as text editor or notepad.
If you have a word
or some other file format you can upload it onto the
Convert Files website (http://www.convertfiles.com/)
and select the input format and .txt for output to get it in the
proper. Sometimes it is
necessary to edit the names of your sequences (some of the
programs cut-off the names at 10 characters or at the first
space and this can lead to confusion is 1 or more names begin
with identical terms) – this can either be done in the text
editor or can be done in the alignment window once the file is
uploaded.
Once you have uploaded the file to the portal, if you are
registered, you should be able to select this file from the
dropdown menu to the right of the “Choose file” button
b.
If you want to change the default parameters you must click on
the advanced options box. We will discuss these options in
class. [Time permitting try rerunning the alignment with a
different weight matrix and/or different gap penalties to see
the effect on the alignment.]
c.
Run the program by clicking on “Run”. If you want to print the
alignment (like for your take-home) you can select “download”
from the options above the results.
You could also choose “back to form” if you wanted to
change the parameters. Under
the alignment file box you can click on “full screen” for easier
viewing.
d.
Click on bookmark to save this alignment for future use.
e.
Although this alignment is nice “as is”, finding consensus areas
(possible signature sequences) can be even easier with coloring
or shading. This is
easier done as below the alignment file box there is a drop down
box next to “further analysis”.
This will use the output of the Clustal alignment as the
input for you next process. Select “boxshade” and change the
parameters to give you a ruler, a consensus line (probably want
to change this cut-off) and your desired color scheme for
viewing. Perhaps try
several to figure out what works best for you.
3.
Let’s compare the
CLUSTALW alignment to one we get using a different alignment
tool, MUSCLE. (You
may want to do this is a new window for an easier comparison
with CLUSTALW.) Go
back to the list of programs and select “muscle”.
a.
Click on “upload” and now
you should be able to select
the original sequence .txt file from the dropdown menu to the
right of the “Choose file” button.
Hit select to load the file.
b.
Again, if you want to change the default parameters you must
click on the advanced options box. We will discuss these options
in class. One
default you must change is for output as the “fasta” default
shows the sequences individually and not in columns like is
typical for alignments.
Changing it to “muscle” or “phylip” work.
c.
Click on bookmark to save this alignment for future use.
d.
You can use the drop down box below the alignment file box next
to “further analysis” just like you did above.
4.
Analyze your alignments.
a.
How do the two
alignments (MUSCLE and CLUSTALW) compare?
Give the approximate numbers (with respect to the
E. coli numbering) for 3-4 areas where
they differ (if they
do) and briefly describe the difference.
b.
Within a given alignment, do the sequences start and end in the
same place? Why do you suppose this is? Do you think
this affects your alignments?
c.
Scanning your alignments, you should see both variable and
conserved regions. Why are both of these features
important?
d.
The region between 1300 and 1400 (E. coli numbering)
contains an area of signature sequence that is considered
universal. Find it and write down at least 10 nt from this
conserved region (assume N's are likely conserved nt).
e.
Give the numbers (from the consensus sequence) for a couple of
regions (size doesn't matter) where Eukarya and Archaea (Methanococcus
and Pyrodictium) have sequence in common but the
Bacterial sequences (E. coli and M. scandinavica)
are different? Give the numbers for a couple of regions
the Archaea and Bacteria share in common? Likewise for
Eukarya and Bacteria? Was the last one harder to find?
Why do you suppose that's true?
5.
Try at least one more
alignment from MAFFT (a Fast Fourier Transform method) or
DIALIGN (a block-based method).
Is this alignment the same or different from the
other two?
Looking them over do you have a preference for any of the
3 or 4 formats
you tried?
If yes, why?
What other information could you use to determine which is the
best alignment?