Lec 3.2

Current Level

Previous Level

Figures for this lecture can be accessed by logging onto the course D2L site at https://uwlax.courses.wisconsin.edu/

Using Genomic DNA Sequences to Discover Genes

Overview

Cytogenetic maps, genetic maps, physical maps, and sequence for MANY different organisms: Saccharomyces, C. elegans, Arabidopsis, Drosophila, zebrafish, human, etc.

Human genome originally cut it into small (!) approximately 1Mbp pieces and each piece was cloned into a separate vector

Clones are sequenced, sometimes without regard to their original order

Can order the clones into contigs using markers and then assemble the sequence

Eventually, we have a contig that consists of an entire chromosome

Eventually, we have contigs of every chromosome

Problem #1

In mammals, only about 1-2% of the genome encodes protein; about 35% is repetitive DNA; about 60% is spacer DNA

How do we sort through the genome to find the genes?

Dealing with ``The Needle in a Haystack" (or ``How to find protein-encoding genes hiding in the midst of all that other DNA)

1. Look for ORFs:

5'- ACGTGCTAATGCGAGCAGCAGCGATCGAGCTGATGCAGGCTTAAGCTAGCTAG

Splicing:

Often, exons are only 50 codons and introns can be as large as 10kb

5'-ACGTGCTAATGCGAGCAGCAGCgtacgtagctgatgctgatgtcagcGATCGAGCTGATGCAGGCTTAAGC

TAGCTAG

Consensus splice sites help computer programs find ORFs even when disrupted by introns

2.Look for CpG islands:

3.Look for association with ESTs:

4.Look for association with known cDNAs:

5.Look for similarity to known mouse or (now) rat genes:

Automation in Gene Prediction

Obviously (I hope) we want to make this automated

GENSCAN

Fly genome

AE003847.5 sequence

Results

Predictions: about 23,000 genes; maybe about 90,000 proteins in humans

Problem #2

What about noncoding RNAs? rRNA, tRNA, miRNA, etc.

Look for similarity to known RNAs from other critters

Click here to email comments to Scott Cooper regarding this site or its links.