BIO/MIC440: Bioinformatics Section 3, Take-Home Assignment
Use the link to download the sequence of
yeast clone #71020.
Use Genscan to find the ORFs
in this sequence using "Vertebrate" as your organism.
1. How many
complete genes are there?
2. How many of the complete
genes have introns?
3. How many amino acids are there in ORF #3?
Copy the predicted protein sequence from ORF #3 and use that sequence to perform an appropriate search to determine the identity of the protein and the gene that encodes it.
4. What is the name of the gene that encodes this protein?
5. Based on the gene
acronym (and other information that you have probably found), in what molecular process do you suppose this gene is involved?
Locate the DNA sequence of the gene. There are many ways to do this, but all of
them should get you to the same answer. Import this yeast DNA sequence into Biology Workbench.
NOTE: Be sure that all sequences entered in this
assignment have descriptive titles so that your final output is labeled so that
I know what everything is! (5 pts. off if I have to work at it at all).
Retrieve the protein sequence of this gene from
Saccharomyces cerevisiae and import it into Biology Workbench.
Next, use any of the multiple ways that exist to
retrieve the DNA coding regions (and the protein sequences) from homologous genes
from Kluyveromyces lactis, Candida glabrata, Candida
and Pichia stipitis. Import all of these sequences into Biology Workbench
so that you have a total of five DNA coding sequences and five protein sequences.
6. Perform a CLUSTALW analysis of the five DNA
coding sequences that you have imported. Then
use the information to draw an unrooted tree. Also, examine the alignment using
TEXSHADE or BOXSHADE (whichever you prefer = ***SHADE). Print out the
tree (one page) and the 3rd and 3rd to last pages (two pages) from the ***SHADE
analysis to hand in. (I am only requesting two pages of the ***SHADE analysis to
save paper while still knowing that you did it correctly).
7. What can you conclude about the relationships among the five organisms based on the DNA sequence of this gene?
Be specific in terms of who is more related to whom based on this single gene?
Do these relationships make sense with respect to what you know (or can find
out) about each of the five organisms?
8. Repeat the CLUSTALW
analysis using the five protein sequences that you imported. Use that
information to draw an unrooted tree and examine the alignment using ***SHADE.
Print out the tree (one page) and the entire ***SHADE alignment analysis to hand in
(this one isn't so long).
9. Were any differences observed in
the trees built using DNA vs. protein sequences? What part of the DNA coding
sequence seems to be most conserved among the five organisms? Does this
correspond to the most conserved portion of the amino acid sequence?
10. Is the DNA or protein sequence a
accurate predictor of evolutionary relationships among organisms? Briefly
explain your answer.
Go to the
Stanford Microarray Database
to find the conditions under which this yeast gene is highly expressed. (You can find the gene by accessing experiment
17017, sorting by log(base2) of R/G Normalized Ratio (Mean) in ascending order,
displaying the Standard Name (far right box), and looking at rows 5401-5600.)
Remember to undo all the filters!
NOTE: At one time, the STANDARD NAME choice
resulted in a column with no information. If this happens to you, you must use
the Saccharomyces Genome Database to find the Systematic Name and
search for your gene using that.
are the conditions under which expression was assayed for this experiment (i.e. what are the channel 1 and 2 probes)?
12. Give a reason why
the results for the gene of interest may not be biologically significant in
experiment #17017 (view the spot and then look at the whole slide).
13. Which experiment number gives the highest
log2 Ratio for this gene?
(ignore the yellow bars)
What is the log2 R/G ratio for the gene in this experiment?
14. Does the gene of interest show
greater than 8-fold changes in gene expression (ignore the yellow bars) under any of the conditions represented in the Stanford database? If so,
for how many experiments is this true? List the experiment ID numbers, the
ratios, and the spot colors.
In the yeast nitrogen starvation experiment (#12732):
15. What are the Channel 1 and Channel 2 probes for this experiment?
16. The gene (Standard Name) with the highest significant
log(base2) R/G Normalized Ratio (Mean) is what? (Remember that values <150 for
channel 1 or 2 are not significant.)
17. What is the
value of the ratio?
18. What is the molecular function of the protein encoded by this gene?
19. In what biological process is the protein involved?
known gene with the lowest significant R/G ratio is what?
21. What is the value
of the ratio?
22. Is there any biological
process information for this particular yeast gene? If so, what?
23. Why might the expression
of this gene be affected the way that it is when the organism is starved for nitrogen?
24. How many different genes show a
significantly greater than 4-fold higher level of expression in the presence vs. the absence of nitrogen?
25. What do your answers to the previous questions imply about the effects of nitrogen depletion in suppressing gene expression?