Current Level

 Lecture 2.1
 Lab 2.1
 Lecture 2.2
 Lab 2.2
 Lecture 2.3
 Lab 2.3
 Take-home 2

Previous Level

 BioWeb Home
 Unit 1
 Unit 2
 Unit 3
 Unit 4
 Genetics Ex
 Lab 2.1

Web sites:


Exercise 1

A.  Sequence Match

     1.  Which of the sequences (>1200) in the RDP most resemble the sequence given below?

             1 gacgaacgct ggcggcatgc ctaatacatg caagtcgaac gcttttgttt caccgggtgc
           61 ttgcacccac cgagacaaaa gagtggcgga cgggtgagta acacgtgggt aacctgccca
          121 taagaggggg ataacatccg gaaacggatg ctaataccgc atatttccaa ttgtctcctg
          181 acagatggaa aaaaggtggc ttcggctacc gcttatggat ggacccgcgg cgtattagct
          241 agttggtgag gtaatggctc accaaggcga tgatacgtag ccgacctgag agggtgatcg
          301 gccacactgg gactgagaca cggcccagac tcctacggga ggcagcagta gggaatcttc
          361 cgcaatggac gaaagtctga cggagcaatg ccgcgtgagt gaagaaggtt ttcggatcgt
          421 aaaactctgt tgttagagaa gaacaaggat gagagtaact gctcatcccc tgacggtatc
          481 taaccagaaa gccacggcta actacgtgcc agcagccgcg gtaatacgta ggtggcaagc
          541 gttgtccgga tttattgggc gtaaagcgag cgcaggcggt tctttaagtc tgatgtgaaa
          601 gcccccggct caaccgggga gggtcattgg aaactggaga acttgagtgc agaagaggag
          661 agtggaattc cacgtgtagc ggtgaaatgc gtagatatgt ggaggaacac cagtggcgaa
          721 ggcgactctc tggtctgtaa ctgacgctga ggctcgaaag cgtggggagc aaacaggatt
          781 agataccctg gtagtccacg ccgtaaacga tgagtgctaa gtgttggagg gtttccgccc
          841 ttcagtgctg cagctaacgc attaagcact ccgcctgggg agtacgaccg caaggttgaa
          901 actcaaagga attgacgggg acccgcacaa gcggtggagc atgtggttta attcgaagca
          961 acgcgaagaa ccttaccagg tcttgacatc ctttgaccac tctagagata gagctttccc
         1021 ttcggggaca aagtgacagg tggtgcatgg ttgtcgtcag ctcgtgtcgt gagatgttgg
         1081 gttaagtccc gcaacgagcg caacccctat tattagttgc cagcattcag ttgggcactc
         1141 tagtgagact gccggtgata aaccggagga aggtggggat gacgtcaaat catcatgccc
         1201 cttatgacct gggctacaca cgtgctacaa tggatggtac aacgagtcgc aaggtcgcga
         1261 ggccaagcta atctcttaaa gccattctca gttcggattg caggctgcaa ctcgcctgca
         1321 tgaagccgga atcgctagta atcgcggatc agcacgccgc ggtgaatacg ttcccgggtc
         1381 ttgtacacac cgcccgtcac accacgagag tttgtaacac ccgaagtcgg tgaggtaacc
         1441 cttttgggag ccagccgcct aaggtgggac agataattgg ggtg

    2.  Check the information (GenBank flat file accessed via S00 number) for your best match.  What habitat was the organism isolated from?

B.  Use the Hierarchy Browser (search) to find Methylomonas scandinavica.

    a.  Does this organism belong to the domain Archaea or Bacteria?

    b.  M. scandinavica is found in which phylum?

    c. M. scandinavica is found in which class?

    d. M. scandinavica is found in which order?

    e. M. scandinavica is found in which family?

    f. Where was this organism isolated from?

    g. Download the sequence to your on-line files. Remember to change the defaults to FASTA format and Remove all gaps.  You can download multiple files at once in a FASTA format that is compatible with many alignment and other programs.  Note this file can only be opened by specific programs and so you will not be able to just click on it to see its contents. This download technique will be useful to you for your take-home. As this is only one of the sequences you will use for labs 2.2 and 2.3 and the other sequences will come from from other sources, you also will also need to copy and paste the FASTA version of the sequence (hover  on S00 number of  the sequence and this option should pop up, click on it to get to the file - note when you copy be sure to include the top line with the name and identifier) to a text document (not a word processing one like Word, as once saved it will have formatting that the programs we use can't handle). 


Probe Match  (set size to "both")

C.  For each of the following sequences, determine:

    a. For which genus of organisms is the signature sequence or probe specific? [That is which genus encompasses (contains) the bulk of the matches given.] How many "hits" (query is complementary to sequence found in this organism's rRNA) fall into this category out of the total hitsBe sure the settings are correct for your entry - they will differ depending on if you are entering a signature sequence or a probe sequence.

    b. For each of the following, how many sequences within the genus aren't complementary to the signature sequence/probe?

c. How many organisms are there in the Domain Bacteria that have zero or one error with the signature sequence/probe?  And, how many organisms are there within the genus you identified that have zero or one error with the signature sequence/probe?

      Possible Signature sequences  (- strand, 5'-3'):



      Possible Probes (+ strand, 5'-3'):




D.  Which of the 4 sequences above would work the best for a genus-specific probe (or signature sequence)?  Defend your answer.  Provide at least 2 reasons for your choice.

E.  When you run Probe Match the size default is "both".  Why do you suppose you'd want both instead of just >1200 or <1200?  When set to "both", if the results show your probe hits 23 of the 52 sequences, does this mean that 29 organisms have sequence with mismatches to the probe in the region of the 16S rRNA where the probe is targeted?  Defend your answer.

F.  Look back at your results for signature sequence 2 above (i.e. matched it in Probe Match).  Was the probe complementary to sequence from all members in this group (with no error)?   Probe match allows you to restrict your search by entering a region of the sequence that should contain your signature sequence/probe target. This narrows the search to include only the sequences in the database that contain sequence data in the target region complementary to your probe.  Note: it still checks the entire sequence though and not just the region entered.  Run Probe Match again setting the region to 1240 to 1290.  How do these results compare to those you had before?  Why are they different?  What do these results tell you about the utility of the probe compared to what you knew before?



(Do not hand in this next part.  These sequences are needed for Lab 2.2 and so you need to have found these sequences before you can do this next lab.  Be sure to use appropriate nucleic acid databases and not protein databases - otherwise you can use any database you prefer. )

Find the following rRNA sequences (or the rRNA gene sequences) and copy and paste a FASTA file of the sequence to the text file you started above in B (reminder - do not use a word processing program like Microsoft Word). Note most of these aren't in the RDP currently as they are still preparing the Eukarya section and so you will need to look elsewhere to retrieve most of these sequences.  For some of these there will be more than one option to choose from.  Please select complete or large partial sequences rather than small partial sequences if possible (about 1500 nt for 16S, over 1400 to preferably close to 1600 for the protozoa and at least 1700 nt for the other 18S, although 1900 is better ) and do NOT use sequences which also contain other genes or spacers.  The protozoa files may refer to these as 16S or 16S-like as they are much smaller than typical 18S rRNA.  We will be using these sequences to perform multiple alignments and create phylogenetic trees and the sequences you select can affect your results.

      Methylomonas scandinavica 16S rRNA (already retrieved above in part B)

       Mouse 18S rRNA

      Human (Homo sapiens) 18S rRNA

      Xenopus laevis 18S rRNA

      Actaea japonica 18S rRNA

      Chlamydomonas nivalis 18S rRNA

      Trichomonas tenax 18S rRNA

      Giardia intestinalis 18S rRNA

      Methanocaldococcus jannaschii 16S rRNA

      Escherichia coli 16S rRNA

      Pyrodictium occultum  16S rRNA




  2002 The Board of Regents of the University of Wisconsin System.

Click here to email comments to Scott Cooper regarding this site or its links.