Take-home

Current Level

Previous Level

BIO/MIC440: Bioinformatics Section 3, Take-Home Assignment

Use the link to download the sequence of yeast clone #71020.

Use Genscan to find the ORFs in this sequence using "Vertebrate" as your organism.

     1. How many complete genes are there?
     2. How many of the complete genes have introns?
     3. How many amino acids are there in ORF #3?

Copy the predicted protein sequence from ORF #3 and use that sequence to perform an appropriate search to determine the identity of the protein and the gene that encodes it.

     4. What is the name of the gene that encodes this protein?

5. Based on the gene acronym (and other information that you have probably found), in what molecular process do you suppose this gene is involved?

Locate the DNA sequence of the gene. There are many ways to do this, but all of them should get you to the same answer. Import this yeast DNA sequence into Biology Workbench.

NOTE: Be sure that all sequences entered in this assignment have descriptive titles so that your final output is labeled so that I know what everything is! (5 pts. off if I have to work at it at all).

Retrieve the protein sequence of this gene from Saccharomyces cerevisiae and import it into Biology Workbench.

Next, use any of the multiple ways that exist to retrieve the DNA coding regions (and the protein sequences) from homologous genes from Kluyveromyces lactis, Candida glabrata, Candida albicans, and Pichia stipitis. Import all of these sequences into Biology Workbench so that you have a total of five DNA coding sequences and five protein sequences.

6. Perform a CLUSTALW analysis of the five DNA coding sequences that you have imported. Then use the information to draw an unrooted tree. Also, examine the alignment using TEXSHADE or BOXSHADE (whichever you prefer = ***SHADE). Print out the tree (one page) and the 3rd and 3rd to last pages (two pages) from the ***SHADE analysis to hand in. (I am only requesting two pages of the ***SHADE analysis to save paper while still knowing that you did it correctly).

7. What can you conclude about the relationships among the five organisms based on the DNA sequence of this gene? Be specific in terms of who is more related to whom based on this single gene? Do these relationships make sense with respect to what you know (or can find out) about each of the five organisms?

8. Repeat the CLUSTALW analysis using the five protein sequences that you imported. Use that information to draw an unrooted tree and examine the alignment using ***SHADE. Print out the tree (one page) and the entire ***SHADE alignment analysis to hand in (this one isn't so long).

9. Were any differences observed in the trees built using DNA vs. protein sequences? What part of the DNA coding sequence seems to be most conserved among the five organisms? Does this correspond to the most conserved portion of the amino acid sequence?

10. Is the DNA or protein sequence a more accurate predictor of evolutionary relationships among organisms? Briefly explain your answer.

Go to the Stanford Microarray Database to find the conditions under which this yeast gene is highly expressed. (You can find the gene by accessing experiment 17017, sorting by log(base2) of R/G Normalized Ratio (Mean) in ascending order, displaying the Standard Name (far right box), and looking at rows 5401-5600.) Remember to undo all the filters!

NOTE: At one time, the STANDARD NAME choice resulted in a column with no information. If this happens to you, you must use the Saccharomyces Genome Database to find the Systematic Name and search for your gene using that.

11. What are the conditions under which expression was assayed for this experiment (i.e. what are the channel 1 and 2 probes)?

12. Give a reason why the results for the gene of interest may not be biologically significant in experiment #17017 (view the spot and then look at the whole slide).

13. Which experiment number gives the highest log2 Ratio for this gene? (ignore the yellow bars)
What is the log2 R/G ratio for the gene in this experiment?

14. Does the gene of interest show greater than 8-fold changes in gene expression (ignore the yellow bars) under any of the conditions represented in the Stanford database? If so, for how many experiments is this true? List the experiment ID numbers, the ratios, and the spot colors.

In the yeast nitrogen starvation experiment (#12732):

15. What are the Channel 1 and Channel 2 probes for this experiment?

16. The gene (Standard Name) with the highest significant log(base2) R/G Normalized Ratio (Mean) is what? (Remember that values <150 for channel 1 or 2 are not significant.)

17. What is the value of the ratio?

18. What is the molecular function of the protein encoded by this gene?

19. In what biological process is the protein involved?

20. The known gene with the lowest significant R/G ratio is what?

21. What is the value of the ratio?

22. Is there any biological process information for this particular yeast gene? If so, what?

23. Why might the expression of this gene be affected the way that it is when the organism is starved for nitrogen?

24. How many different genes show a significantly greater than 4-fold higher level of expression in the presence vs. the absence of nitrogen?

25. What do your answers to the previous questions imply about the effects of nitrogen depletion in suppressing gene expression?

Click here to email comments to Scott Cooper regarding this site or its links.