Current Level

 Lec 3.1
 Lab 3.1
 Lec 3.2
 Lab 3.2
 Lec 3.3
 Lab 3.3
 Lec 3.4
 Lab 3.4

Previous Level

 BioWeb Home
 Unit 1
 Unit 2
 Unit 3
 Unit 4
 Genetics Ex
 Lec 3.2

Figures for this lecture can be accessed by logging onto the course D2L site at

Using Genomic DNA Sequences to Discover Genes


Cytogenetic maps, genetic maps, physical maps, and sequence for MANY different organisms: Saccharomyces, C. elegans, Arabidopsis, Drosophila, zebrafish, human, etc.



Human genome originally cut it into small (!) approximately 1Mbp pieces and each piece was cloned into a separate vector


Clones are sequenced, sometimes without regard to their original order


Can order the clones into contigs using markers and then assemble the sequence



Eventually, we have a contig that consists of an entire chromosome

Eventually, we have contigs of every chromosome



Problem #1

In mammals, only about 1-2% of the genome encodes protein; about 35% is repetitive DNA; about 60% is spacer DNA



How do we sort through the genome to find the genes?



Dealing with ``The Needle in a Haystack" (or ``How to find protein-encoding genes hiding in the midst of all that other DNA)

1. Look for ORFs:






Often, exons are only 50 codons and introns can be as large as 10kb




Consensus splice sites help computer programs find ORFs even when disrupted by introns



2.Look for CpG islands:



3.Look for association with ESTs:



4.Look for association with known cDNAs:




5.Look for similarity to known mouse or (now) rat genes:




Automation in Gene Prediction

Obviously (I hope) we want to make this automated


Fly genome

AE003847.5 sequence


Predictions: about 23,000 genes; maybe about 90,000 proteins in humans



Problem #2

What about noncoding RNAs? rRNA, tRNA, miRNA, etc.




Look for similarity to known RNAs from other critters






  2002 The Board of Regents of the University of Wisconsin System.

Click here to email comments to Scott Cooper regarding this site or its links.