|
The general outline for the last labs is to become familiar with the programs used to analyze protein sequences and structures using trypsin and rhodopsin as common examples. You will each be assigned a protein to explore individually, using the techniques we went through in lab. There should be time during lab for you to work on your individual projects, although some work may need to be done outside of class.
Predictions from primary amino acid sequenes.
When studying a protein, the first information we will have available will be the amino acid sequence. Currently this is obtained primarily as a translation of DNA sequence.
We will focus on three areas of prediction from amino acid sequence
- Secondary structure
- Motifs
- Transmembrane domains
Protein Review
General structure Amino Acid Structure
Three central atoms
1. Alpha amino group – basic group 2. Alpha carboxyl group – acidic group 3. Alpha carbon – forms bond with R-group

Protein Structure
Primary Structure (amino acid sequence)
Secondary Structure (alpha helix and beta sheet)
Motifs (structural)
Domains (active units)
Tertiary Structure (overall 3-D structure)
Primary Structure
Peptide bond is a partial double bond, so backbone can only rotate around psi
(y) and phi (f) bonds.

Secondary Structure
alpha helix
Right-handed 3.6 aa/turn 1.54 Å rise/aa 5.4 Å rise/turn R-groups point out away from the center of the helix C=O bond of the nth residue points toward the N-H group of the (n + 4) residue Dipole moment within the helical axis (amino end more +) (carboxy end more -)
beta sheet
Antiparallel – neighboring hydrogen bonded polypeptide chains run in opposite directions Parallel – neighboring hydrogen bonded chains extend in the same direction
Characteristics of beta sheets
Hydrogen bonds shared between neighboring chains
Successive R-groups extend to opposite sides of the sheet and are 7.0 Å apart
Exhibit a right-handed twist
Parallel sheets are less stable than anti-parallel sheets
Topology (connection) between neighboring anti-parallel can be small and simple
Topology between neighboring parallel strands is typically long and complex
Turns and Loops – used to join helices and sheets together – allow protein structure to quickly change direction and keep its compact shape
b-turns (bends)– involved when peptide chain rapidly reverses direction
Characteristics of b-turns
Tight 180 ° turn – 4 amino acids involved
Glycine and proline are usually involved
Glycine because it is flexible
Proline because it can adopt a cis configuration
Glycine usually residue 3
Proline residue 2
Because of the constraints on rotation of the backbone in an alpha-helix or
beta-sheet, the psi and phi angles tend to cluster in a specific grouping for
these two structures. These are best represented graphically in a
Ramachandran plot of the psi and phi angles of each amino acid in a protein.
Prediction of Secondary Structure
Under BIOLOGY WORKBENCH, go to protein tools and use NDJINN to locate files containing the protein sequences of human trypsinogen (TRY1) and rhodopsin (HSU49742). When using NDJINN select GBPRI to only search primate sequences.
Hint: When you do this on your protein, you should use the
sequence from PDBFINDER.
Secodary Structures are alpha helices and beta sheets in proteins. Programs use different algorithms to predict these structures, but often arrive at similar conclusions. Such information can be useful in predicting the approximate structure of a protein or a particular region of interest in a protein.
The program PELE on BIOLOGY WORKBENCH is very useful in that it predicts secondary structure using eight different programs and aligns their results. This allows you to rapidly compare the results from several models.
These algorithms all are based upon the observed frequency of specific amino
acids in different secondary structures. There is a good description of
how the algorithm is actually performed on pages 163 and 164 of your text.
Essentially the program looks at a window of 6 amino acids, and if at least 4
have values >1 for a given secondary structure then it assumes that the
structure may be present. Common helix, sheet and turn residues are in
bold below.

Analyze your two sequences using PELE. We will discuss the results as a group.
Do you see any major differences between the two proteins with respect to alpha helices and beta sheets?
Do the different programs seem to vary in their prediction of alpha helices and beta sheets?
Motifs and Domains
Conserved primary sequences or simple combinations of a few secondary structural elements.
MOTIFS (sequence)
Glycosylation sites, phosphorylation sites, ATP binding site

MOTIFS (structural)
Helix-turn-helix DNA binding motif

Calcium binding motif

Epidermal growth factor structural motif

DOMAINS
Domains are fundamental units of tertiary structure.
These are polypeptide chains that can fold independently into a stable tertiary structure.
They can also form units of function.
Domains are built from structural motifs.
Protein Motifs refer to short amino acid sequences which often give the protein a specific function. Examples include glycosylation and phosphorylation sites, as well as conserved residues in enzyme active sites. The presence of specific motifs can give a researcher clues to the function of a protein, and may be valuable in designing experiments to test the function or activity of a protein.
Hint: When you do this on your protein, you should use
the sequence from PDBFINDER - this will help you locate your active site on the
3D structure later.
We will use three programs to examine motifs.
1. PROSITE – method of determining the function of an uncharacterized region of protein – it is a database consisting of biologically significant patterns and profiles formulated in such a way as to identify to which family of protein the new sequence belongs, or which domain it may contain – typically a cluster of residue types, which known to signify a motif of fingerprint can be identified - the use of protein sequence patterns and profiles to determine function will be the challenge for the genome project for years to come (Proteomics is born)
Analyze each sequence using PROSEARCH. This uses the program Prosite to analyze the sequences for many common motifs.
You will get the following results for trypsinogen.
Access# From->To Name Doc# _______ ________ ____________________ _________ PS00002 142->146 GLYCOSAMINOGLYCAN PDOC00002 PS00005 114->117 PKC_PHOSPHO_SITE PDOC00005 PS00006 150->154 CK2_PHOSPHO_SITE PDOC00006 PS00006 215->219 CK2_PHOSPHO_SITE PDOC00006 PS00008 26->32 MYRISTYL PDOC00008 PS00008 50->56 MYRISTYL PDOC00008 PS00008 145->151 MYRISTYL PDOC00008 PS00008 191->197 MYRISTYL PDOC00008 PS00008 208->214 MYRISTYL PDOC00008 PS00134 59->65 TRYPSIN_HIS PDOC00124 PS00135 194->206 TRYPSIN_SER PDOC00124
Click on PDOC00002.
What is a glycosaminoglycan?
What can you learn about the possible activity of this protein from the last two entries?
What can you predict about the activity of rhodopsin from the last two entries in its PROSEARCH file?
2. RPSBLAST performs a blast search of your sequence vs. a database of conserved domains in families of proteins. Your sequence is compared to the consensus sequence of many families of proteins to look for a match. This is very useful in identifying which family your protein belongs to, especially over larger domains.
3. BLIMPS is similar to RPSBLAST, except that it looks for specific blocks or domains of sequence similarity. A protein may overall have relatively low similarity to another protein, but if it has high similarity in specific important regions it may have the same activity and be a homologous protein. BLIMPS compares a protein or nucleic acid sequence against an the BLOCKS database of conserved protein motifs. The scores for high scoring BLOCKS found within the query sequence are totalled and a family classification is made based on the total score for each block found in the query sequence. Individual block scores are listed beneath the family classification along with the highest scoring alignments.
Transmembrane Domains are hydrophobic alpha helices or beta sheets which can span lipid bilayers. It takes about 20 amino acids to span a lipid bilayer in an alpha helix. Programs can detect these transmembrane domains by looking for the presence of an alpha helix 20 amino acids long which contains hydrophobic amino acids.
GREASE allows you to generate Kyte-Doolittle Hydropathy Profile. This does not predict secondary structure, so it will detect both alpha helix and beta sheet transmembrane domains. Numbers grater than 0 indicate increased hydrophobicity, numbers less than 0 indicate an increase in hydrophilic amino acids.
TMHMM allows you to predict location of transmembrane helices and location of intervening loop regions. This program will also predict which loops between the helicies will be on the inside or outside of the cell or organelle. This program will not detect beta sheet transmembrane domains.
TMAP uses a Kyte-Doolittle Hydropathy Profile to detect transmembrane spanning domains. This does not require that the domain be an alpha helix, as in TMHMM. It also provides the amino acid numbers for the transmembrane domain.
Analyze the rhodopsin sequence using GREASE, TMHMM and TMAP. We will discuss the results as a group.
Hint: When you do this on your protein, you should use
the sequence from SWISSPROT-HUMAN - if your protein has a signal peptide it will
be cut off in the PDBFINDER sequence.
Which protein seems to be the most hydrophobic?
Which could span a membrane?
How long is a membrane spanning sequence?
Would TMHMM detect beta sheets that span membranes?

Rhodopsin in a membrane
Now analyze the trypsin sequence with TMAP. Trypsin is a soluble
protein and is not found in a membrane. However, it appears to have a
short membrane spanning sequence at its amino terminus (see below). This is a signal
peptide, which is used to target the precursor to trypsin (trypsinogen) into the
Endoplasmic Reticulum so that it can be secreted. The signal peptide is
then cleaved off inside of the ER.

Your Assignment
On the page below we have listed 17 sequences. You will be assigned one of these sequences for your final project. Perform the three types of primary amino acid sequence analysis we just did in lab on your assigned sequence. From this portion of the lab we would like you to predict the function of the protein based upon its motifs, regions of secondary structure, and hydrophobic regions and transmembrane domains. You will be comparing these predictions to actual three dimensional structures later in this unit.
Report for this Unit (50 points total)
We would like a formal written report with the following
information. Don't paste in the questions, these are just to help you be
organized. You can create figures in your report by right clicking on an
image, and then copy and paste it into your report. Don't add lots of
extra output, i.e. names and accession numbers from Biology Workbench, just the
figure and a figure legend, and then explain what it means in your well written,
rational report.
1. Perform a BLASTP on your assigned sequence against the PDBFINDER
(sequence of the protein from a crystal structure) and SWISSPROT-HUMAN (sequence
from the DNA) databases in Biology Workbench (you can select both simultaneously
using the Ctrl key). Provide a brief one paragraph description of the protein you were assigned, i.e. what is it, what does it do,
is it in any biochemical pathways, in what organs is it found, is it intra
or extracellular, etc.
2. A description of what you could predict about the function of the protein based upon its
motifs and regions of secondary structure, and its location (intra or
extracellular) based upon hydrophobic regions and transmembrane domains (Lab 4.1). How well did these predictions match what you found out about your protein from the literature and from inspection of the 3D structure of the protein,
i.e. how many alpha helices and beta sheets were predicted, and how many were in
the 3D structure?
A PyMol image containing the 3D structure of your protein in cartoon form with
the secondary structures colored and the active site residues in space filling
format. You can identify the active site residues using PROSEARCH.
Include a description of the motifs found and images showing the predicted secondary structure, hydrophobic regions and transmembrane domains.
3. A description of the family of proteins to which your protein belongs (paralogs,
Lab 4.2). For this portion of the unit we would like you to identify and import
6-7 related human protein sequences (not 6-7 different sequences of the same protein) and align these sequences. Try to choose several
paralogs, not just the most closely related, but don't go much below score of
100 or the alignments won't be very good. If there is a known motif for the class of protein your protein falls into, look for this motif in your aligned sequences.
Include a picture of the amino acid sequence alignment and the 3D alignment
from Consurf (Lab 4.3). Which regions seemed to be conserved and is this consistent with what you know about the active site or binding site of the protein?
4. A description of the evolution of your protein in different species (orthologs,
Lab
4.2). For this portion of the unit we would like you to identify and import
6-7 protein sequences of this same protein from different species and align these sequences. Try to choose some distantly related species for comparison, i.e. can you find this protein in yeast or bacteria? If there is a known motif for the class of protein your protein falls into, look for this motif in your aligned sequences.
Include a picture of the amino acid sequence alignment and the 3D alignment
from Consurf (Lab 4.3). Which regions seemed to be conserved and is this consistent with what you know about the active site or binding site of the protein?
5. A conclusion summarizing your findings. Specifically comment on the
following
Where is the most sequence similarity seen on the 3D structural
alignments? Were orthologs or paralogs more highly conserved? Is
this consistent with the relative functions of orthologs and paralogs?
Mention how these bioinformatics tools have helped you to better understand
the evolution and function of your assigned protein.
If there were limitations to the programs, mention these as well, i.e. how accurate were the bioinformatics programs you used in predicting
motifs, secondary structure, etc. This will be due at the end of the last day of class.
|