Questions for Bioinformatics Lab 2
Due in class or at the beginning of the lecture Wed., Jan. 31
Note: Please paste this document into a new window and answer the questions.
Question 1. Examine the output from your BLAST search. Did you identify a unique sequence in the database? Is there only one sequence that exactly matches all the bases in your query?
In your opinion, is the match biologically relevant? Hint: Look carefully at the E values. Do the matches appear to be random? Why or why not?
Question 2. Do all the matches correspond to DNA from one organism? Do most of them? Of the sequences that match your query and are human, do they all correspond to the same chromosome?
Question 3. Formulate a hypothesis to account for these (surprising) results.
Question 4. Is the information in this file consistent with your hypothesis? Why or why not? If not, would you now like to formulate a new hypothesis?
Question 5. What are the specific bases in the sequence that matched your query (please specify by number)? Compare these numbers to the "Features" information from the sequence file you retreived above. Does your query align with any particular feature in the sequence?
Question 6. Look closely at the query sequence you used above to run the search. Why in your opinion do you find no alignments to the string of A's at the end of the query?
Question 7. How many sequences matched your query when searching the protein database? What do these sequences have in common?
Question 8. Based on the results of these two searches, which output do you feel was easier to interpret? Which search do you feel was more sensitive? Why?
Question 9. Retrieve the Swiss-Prot file corresponding the best match from the most recent search (blastx). Read the "Comments" in the header. Given the results of the two searches you have performed, why are the managers of the Swiss-Prot database concerned about these sequences?
Question 10. What do you believe the query sequence encodes?
Question 11. How many of the matches do you believe are biologically relevant?
Question 12. How may bases of your query match each of these sequences? Please give your answers as fractions (for example, 87/101).
Question 13. How much did the E value rise with only one base difference? (Your answer can be in the form of: The E value went from XX to XX).
Question 14. What is the result of the search? Why do you think you obtained this result? Does this result surprise you given that you know that the sequence is in the database?
Question 15. What parameters of the search do you feel you could adjust to try to get a match to a sequence in the database? Could you modify the expect value (E value) threshold? (If you are unclear what an E value is, review the definition of anExpect value. Should you make the E value threshold larger or smaller to increase the likelyhood of obtaining a match?
Question 16. Do you get any matches from the search now? Are any to the the same two matching sequences you found when using the complete (140 bp) query sequence?
Question 17. If you were to return to the BLAST server and run an advanced BLAST search with the same 13 base query sequence, and want to make the search more sensitive by altering the the word length, do you want to use a smaller or larger word size? Recall that the default word size on nucleic acid searches is 11.
Question 18. Could you identify the sequence if you didn't know what it was?
Question 19. You will probably have realized that the FASTA reported sequences were almost all clone or vector sequences. As you know that the query sequence is from the human beta globin gene click on the word "align" to the right of this sequence in the list of "best scores." To what bases (give numbers) in the sequence from the database does your query align?
Question 20. It is fairly straightforward to understand why few sequences in the database match your query well. BUT, why is the histogram output shaped like a sideways bell curve? That is, can you explain why few sequences in the database match your query very poorly? HINT: Think about the algorithm used to do the search.
Question 21. What does the query sequence match? (Give the locus name). Can you explain why the blastx seach gave such a funny result?
Question 22. Can you explain why the blastx seach gave such a different result? Which do you believe and why? HINT: Retreive the best matching sequence from the nucleotide database. Carefully examine which specific bases of the sequence were used as a query.
Question 23. What is the morale of the story in this exercise (make up your own!!)? (e.g., The fact that the two search methods yielded completely different results has taught you what?).
Question 23. What is the SWISS-PROT primary accession number?
Question 24. What is the most common name of the protein?
Question 25. What is the gene called?
Question 26. Which year was the crystal structure of the catalytic domain determined? Name the first named author of this work
Question 27. Does the enzyme require a co-factor to function? If so, what?
Question 28. Name the most common disease that arises as a result of deficiency of this enzyme.
Question 29. Which cytogenetic locus does the gene reside at? (e.g. 13p10.1)
Question 30. What is the PAHdb?
Question 31. How many amino acid residues are there in the protein?
Question 32. What is the molecular weight of the protein?
Question 33. What were the number of sequences in the database when you ran the search?