Homology Search
In a homology search a test sequence is compared to all of the different sequences in a large database, and those sequences in the database with the closest match, or most homology, are reported. If you had sequenced a gene and didn't know if it had been discovered before you would perform this type of search. One can also search using a protein's amino acid sequence to find other homologous proteins. Homology searches are easily done over the www using the program BLAST (Basic Local Alignment Search Tool) from NCBI (National Center for Biotechnology Information, Washington, D.C.). The NCBI has a database called GenBank containing all of the know DNA and protein sequences from around the world. The
number of submissions has increased exponentially
in the past 20 years.
YEAR |
BASES |
SEQUENCES |
1997 |
967,000,000 |
1,491,000 |
1998 |
1,622,000,000 |
2,356,000 |
1999 |
3,400,000,000 |
4,610,000 |
2000 |
10,300,000,000 |
9,102,000 |
2001 |
15,849,921,438 |
14,976,310 |
2002 |
28,507,990,166 |
22,318,883 |
|
When you submit a sequence for a search, it is compared to all of these sequences, and the best matches are displayed. Each sequence is given a number called an accession number. This unique number makes it easier to keep track of individual sequences. For more information on how these programs work, visit the site ``A Guide to Molecular Sequence Analysis" or our bioinformatics lecture on this topic on this site.
We can either use the program BLAST directly to perform a database search, or use the Biology WorkBench.
Using Biology WorkBench to search a sequence database.
Log onto the Biology WorkBench and either create a new session or resume an existing session. To search for amino acid sequences select Protein Tools, to search for DNA or RNA sequences select Nucleic Tools.
To use a nucleic acid sequence (NS) or a protein sequence (PS) to search a database (DB), first select the sequence you wish to use, then select either BLASTP for proteins or BLASTN for nucleic acids. BLASTX translates the nucleic acid sequence into all six frames of amino acid sequence and uses that to search the database. You will then have to select the appropriate database to search and submit the search (For example GBPRI1 is a GenBank Primate Sequence Database).
If you want to obtain one of the sequences that matched with your sequence, just click on the line and press Import to Workbench. To select multiple sequences hold down the control key as you select each sequence.
These sequences will be added to your stored sequences.
Using BLAST to search a sequence database.
1. We will use the program BLAST
(www.ncbi.nlm.nih.gov/BLAST/) from NCBI.
2. Use the program BLAST 2.0. There are several different searches that can be performed. You should just select Basic BLAST Search.
3. Type or paste your sequence into the box below the button Submit Query. Do not add any spaces or characters other than A, C, G or T. Push the Submit Query button.
4. Your results will appear next. The files at the top of the list represent the best matches. Click on the blue file name to the left of the description to go to that sequence.
5. Each sequence will be accompanied by data indicating who submitted the sequence, and any journals that this may be published in. Click on the button at the top of the page labeled FASTA, this gives you just the DNA sequence.
6. If you click on the button labeled Protein you will get a link to the translated amino acid sequence.
7. If you wish to save a sequence, highlight the DNA sequence using you mouse and copy the sequence, either by simultaneously pushing Control and C, or by selecting Copy under Edit.
8. You can now paste this sequence into other programs for further analysis, or into a word processing file for storage.
. |