Figures for this lecture can be accessed by logging onto the course D2L site
at
https://uwlax.courses.wisconsin.edu/
Using Genomic DNA Sequences to Discover Genes
Overview
Cytogenetic
maps, genetic maps, physical maps, and sequence for MANY different organisms:
Saccharomyces, C. elegans, Arabidopsis, Drosophila, zebrafish, human, etc.
Human genome originally cut it into small (!) approximately 1Mbp pieces and each piece was cloned into a separate vector
Clones are sequenced, sometimes without regard to their original order
Can order the clones into contigs using markers and then assemble the sequence
Eventually, we have a contig that consists of an entire chromosome
Eventually, we have contigs of every chromosome
Problem #1
In mammals, only about 1-2% of the genome encodes protein; about 35% is repetitive DNA;
about 60% is spacer DNA
How do we sort through the genome to find the genes?
Dealing with ``The Needle in a Haystack" (or ``How to find protein-encoding genes hiding in the midst of all that other DNA)
1. Look for ORFs:
5'- ACGTGCTAATGCGAGCAGCAGCGATCGAGCTGATGCAGGCTTAAGCTAGCTAG
Splicing:
Often, exons are only 50 codons and introns can be as large as 10kb
5'-ACGTGCTAATGCGAGCAGCAGCgtacgtagctgatgctgatgtcagcGATCGAGCTGATGCAGGCTTAAGC
TAGCTAG
Consensus splice sites help computer programs find ORFs even when disrupted by introns
2.Look for CpG islands:
3.Look for association with ESTs:
4.Look for association with known cDNAs:
5.Look for similarity to known mouse or (now) rat genes:
Automation in Gene Prediction
Obviously (I hope) we want to make this automated
GENSCAN
Fly
genome
AE003847.5 sequence
Results
Predictions: about 23,000 genes; maybe about 90,000 proteins in humans
Problem #2
What about noncoding RNAs? rRNA, tRNA, miRNA, etc.
Look for similarity to known RNAs from other critters
|