Take Home Assignment # 2
Use the
hierarchy browser
search tool to find your assigned genus in the RDP.
Set the size to >1200 nt as you want to use only close to full length sequences. Please hand in a list
(not a print out) of
the hierachal “phylogenetic steps” to your assigned genus
starting with the Domain - identify what type each step is (for example the
first step is a type of Domain). Retrieve
all the sequences within the genus either from the RDP or from any of the
other databases you've learned to use in this course (regardless of whether the
individual species within the genus actually include your genus name as part of their name). Note: It is
important to check "remove all gaps" in the sequence when downloading it if it
comes from RDP, which is an aligned database. Find your genus
again, only this time set the size to both. How many total sequences
are in this genus, including the short partials? Note: Do not
retrieve any of these shorter sequences - only use the >1200 nt for
the rest of the take home. (6
pt)
Align the retrieved sequences
that fit the above criteria
using CLUSTALW. Hand in a copy of the CLUSTALW alignment.
Be sure that each sequence is clearly identified (by name, NOT by
number) - either by editing the
original sequence labels (in the sequence text box NOT the label line) or by handwriting the full name next to each sequence
identifier. The computer will sometimes chop off the labels resulting in
identical labels for longer names for some clones where the unique identifiers
fall only at the end of the name. (3 pt)
Run a second alignment with E. coli
16S rRNA (or another bacterial 16S rRNA as long as it is outside of your group)
as one of the sequences – don’t hand this in
(-2 pt if you do). Comparing
the consensus sequence from the first alignment to this second alignment should help you find regions that
might be unique signature sequences for your genus.
Briefly explain why this second alignment should help you to determine these regions.
Why might you choose to use another bacterial 16S rRNA sequence instead of E.
coli? If you had a picture of the secondary structure of the Bacterial
16S rRNA showing which
areas are highly conserved and which are highly variable (like the one I showed
in lecture), which of these areas would be a
better place for you to search for signature sequence for your genus? Select 3-4
different potential sequences (potential
probe targets - see tips below). Highlight each of your selected
signature sequences on the CLUSTALW alignment pages you are handing
in and number them 1-3 or 4. Run
each candidate through Probe Match to determine their
usefulness. Hand in a copy of the
Probe Match (be sure it is set to both) information on your signature sequence candidates - be sure the
print-out includes the number of hits in the genus (but I don't need to see
which members of the genus match) and the corresponding number from the
highlighted region on the alignment. On this printout also write down
the total number of hits you get for the probe when you also allow for 1
mismatch (do not print this out as it can get long, just write it on the initial
print out). Rerun each signature sequence
through Probe Match restricting the search to sequences with data in the region
of this signature sequence. (Pick a region 10-20 nt before the start of
your signature sequence to 10-20 nt after the end. Think - what
information have you generated that will make it easy to identify where this
would be on the E. coli sequence?) If you lose hits within your
genus, try widening the window by a few more nucleotides on each end until you
get them back. Hand in the output from this
restricted run just like you did with the initial Probe Match run, however,
somewhere on each printout write down the numbers for the region you restricted
the run to. Also, hand in a paragraph analyzing
each of your Probe Match results. Which one of your signature
sequences do you believe is the best? Explain why you consider this signature sequence
to give you a better probe than each of the other options.
Be sure to include the information you gained by allowing for a mismatch and by restricting the search area in
your argument. The signature sequence you selected may be
the best of your 3-4 candidates, however, in the real world would you consider this
probe search a
success? Include an explanation as
to why you do or don't believe the selected sequence will give you a useful
probe for your group. (15 pt)
Construct
phylogenetic trees for your assigned group of organisms using two different tree
methodologies (not just appearance like with rooted and unrooted).
Hand in the trees along with the name and description of the tree
methodology (not program name) used for each. Be sure it is clear
how these two methodologies differ. Please evaluate your phylogenetic results, including the following in the evaluation discussion:
Did either of your methods give you multiple trees? Why would this occur?
Do your two trees truly differ? Would
you expect them to differ? Explain. Are
the branch lengths valid for either of your trees? If so, which? Do you prefer one
tree over the other? For your group,
speculate as to whether selecting "correct for multiple
substitutions" or not as one of your analysis parameters would make a
difference. Explain your
answer. (15 pt)
It is generally necessary
to manually edit your aligned sequences prior to phylogenetic analysis or use a
mask during the analysis.
While I don’t want a nucleotide by nucleotide edit for this
assignment, there are some simple types of edits that may be required for your
alignment. Please perform this/these
edit(s) but do not print it it out. Either mark your edits
on the CLUSTALW alignment you generated above (with the edited areas circled)
and give a brief explanation as to why you performed the edits, or turn in a paragraph explaining why you did not need to make any
edits to your aligned sequences if you believe that to be the case. Construct a new phylogenetic tree with
the edited alignment using either of the two methodologies you used above and
hand in this new tree (clearly labeled as edited) with answers to the following question(s).
How is the "edited" tree different from the original tree or is it the same? If
they are different, which do you think is
more valid and why? (5 pt)
You
also need to run a bootstrap analysis on your phylogenetic groupings.
Please hand in a hard copy of this computer analysis (be careful all
necessary information is included if you don't hand in the entire print out) and
write the calculated bootstrap values at the appropriate nodes on one of the
trees that you are handing in. Use
whichever of the trees has the same groupings as the neighbor-joining tree
that the bootstrap program generates. If
they are completely different, then sketch out the bootstrap neighbor-joining
tree by hand and write the bootstrap values on this tree. Be sure to state how many random trees are tested so we know
if the 89 is out of 100 trees or 1000. On the tree note whether or not any
of the branches (groupings) are not valid according to the bootstrap analysis. (6 pt)
Probe Design tips
1. Probes, optimally, should be about 18-25 nucleotides in
length, but some are as short as 15 nucleotides and others are longer than 25.
2. If most of your sequences have a particular nucleotide (say a
T for example) at a site, but one or two of the sequences have an N at that site
(meaning it could be any nucleotide), go ahead and design the probe with that
nucleotide (the T in my example).
3. If you are having considerable trouble finding consensus
sequence regions long enough for probes, expand your options by first checking
to see if some base uncertainties are masking consensus.
The following IUPAC abbreviations may be used within your sequence:
R for A or G, W for A or T, S for G or C, M for A or C, Y for C or T, and
K for G or T. Consider then that an
R may be in consensus if the other sequences all have A or all have G at that
position.
4. If necessary, design the probes using an ambiguous base like
R or W (only 1 ambiguous base for probes shorter than 20 nt or 2 for probes over
20 nt).
5. Your phylogenetic trees may show that 1 sequence or a small
group of sequences is more distantly related to the rest of the sequences.
If this sequence(s) is causing problems in finding a consensus region for
a probe, go ahead and design your probe for just the main group of sequences.
You will need to explain this in your paragraph on the probe.
Genus names of organisms for Take Home Assignment # 2
1.
Natronorubrum
2.
Asticcacaulis
3.
Marinospirillum
4.
Methanimicrococcus
5. Thermovibrio
6.
Piscirickettsia
7.
Anaerolinea
8.
Pontibacter
9.
Thermocladium
10.
Cardiobacterium
11.
Streptobacillus
12.
Gelidibacter
13.
Deferribacter
14.
Phaeospirillum
15.
Antarctobacter
16.
Aequorivita
17.
Methylosarcina
18.
Pelagibaca
19.
Sandarakinorhabdus
20.
Azovibrio
21.
Runella
22.
Roseivivax