Lab 3.2

Current Level

Previous Level

BIO/MIC440: Bioinformatics Lab 3-2

Yeast

1.Open the link ( http://genome-www.stanford.edu/Saccharomyces/ ) to the Saccharomyces Genome Database (SGD). Click on "Gene/Seq Resources".

2. Copy the following DNA sequence and paste it into the box to submit.

ATGTCCATCG AGGAGGAAGA TACAAATAAG ATCACATGTA CGCAAGACTT TCTTCACCAA TACTTTGTAA CTGAAAGGGT TAGCATTCAA TTTGGGTTAA ATAACAAGAC CGTAAAAAGG ATAAATAAAG ATGAATTTGA TAAGGCAGTA AATTGTATCA TGTCATGGAC AAACTATCCT AAGCCTGGGT TAAAACGAAC AGCTTCAACG TACCTCTTAA GCAATTCCTT TAAGAAATCT GCAACAGTAT

3. After your submission is complete, you will be given a list of six sequence analyses from which to choose. You want to see if the DNA sequence that you entered is part of a specific yeast gene. Which analysis should you do? Do it, ignoring any warnings that may come up.

4. How many hits did you get with the DNA sequence from step 2?

Which of the hits is "better"?

What evidence supports your decision?

5. Obtain the ORF map for the hit you chose in step 4.

What gene is this DNA sequence a part of?

Which part of the gene was the query sequence a part of (e.g. 5' end, 3'end, middle)?

What chromosome is the gene on?

Where is this gene located with respect to the centromere (i.e., is it on the left arm or the right arm of the chromosome)?

6. Select the gene by clicking it on the map.

What is the Systematic Name for this gene? (We have been working with the Standard Name).

What is its mutant phenotype?

What biological processes is the gene involved in?

7. When they compared the gene to DNA from other organisms, what two other species were shown to have similar genes?

Do you suppose any of these "homologies" are biologically relevant? Why or why not? (Use the e values for the alignment to support your answer).

Click on the accession number link for one of the two homologs. (If it's taking too long, use NCBI to search the protein databases using the Q---- accession number). Do you know any of the authors on the paper that outlines the work done to find these DNA sequences? :)

Select the REC104 sequences from all three species to show a protein alignment among all three. Looking at the ClustalW alignment that results, comment on the numbers (not exact numbers, just a rough idea) of exact matches, strong similarities, weak similarities, and non-matches when comparing all three.

By eyeballing the alignment, which of the two non-cerevisiae species has a Rec104 protein most similar to the protein from Saccharomyces cerevisiae? How do you know?

Does your answer to the previous question agree with the data shown in the dendogram (at the top left of the PSI-BLAST results page)? Should it?

8. Go back to the REC104 Gene/Sequence Resources page and retrieve the 6-frame translation of the Saccharomyces cerevisiae REC104 gene.

What information is obtained using this function?

Which reading frame appears to contain the open reading frame?

Are there any introns in this gene?

How do you know?

How many DNA bases make up the ORF and how many amino acids does that encode? Go to the REC104 Summary page and retrieve sequences to see if you're correct.

9. On this same summary page, click on the link next to the Biological Process that says "meiotic recombination". This is a list of other functionally related genes. Click on the REC114 gene.

Which chromosome is this gene on? Is it on the left or right arm of that chromosome?

10. Retrieve this DNA sequence.

How many bases make up the genomic sequence of this gene?

How many amino acids should this encode?

Why is this not a whole number and/or does this number of predicted amino acids match the number of amino acids that exists from the ORF? Why not??

Part II

You have narrowed your gene search from Lab 3.1 to the F7F1 BAC of Arabidopsis thaliana. Here are some links that you will need for this exercise:

TAIR

BLAST

1. Obtain the sequence of the BAC clone (through TAIR) and run the Arabidopsis sequence through GENSCAN. You can either copy and paste the sequence into the window in GENSCAN or you can create a text file and save it to the desktop (or a disk) to load the sequence into the program. For a primer on the use of GENSCAN see the molecular biology page Splice Sites.

There should be 16 genes on this BAC, according to the GENSCAN output. If not, what might you have done incorrectly before running the scan? Fix that problem and run the scan again until you get 16 genes.

Which putative gene is the largest (in terms of number of nucleotides)? How many nucleotides (ignore the promoter and the poly A+ site)? How many introns?

2. You have mutant plants that seem to have trouble converting OAA to the amino acid aspartate. What could you do to determine if there is a candidate gene on the BAC that will carry out this reaction? DO IT.

Which "gene" did you find (GENSCAN number)? Give the predicted amino acid sequence of the protein. How many introns are in the gene? What are the sizes of the introns?

3. Is the match between the GENSCAN predicted peptide and the "real" peptide in GenBank perfect? Even if it is, why would you not necessarily expect it to be?

4. How would you test if this putative gene is your gene of interest? (Hint: you have mutant plants.)

Click here to email comments to Scott Cooper regarding this site or its links.