Current Level

 Lab 4.1
 Lab 4.2
 Lab 4.3

Previous Level

 BioWeb Home
 Unit 1
 Unit 2
 Unit 3
 Unit 4
 Genetics Ex
 Lab 4.1

Lab 4.1 Proteomics

The general outline for the last labs is to become familiar with the programs used to analyze protein sequences and structures using some protein examples.  You will each be assigned a protein to explore individually, using the techniques we went through in lab.  There should be time during lab for you to work on your individual projects, although some work may need to be done outside of class. This section focuses on proteomics, which is the study of protein structure and function using computers, databases, and protein sequences/structures. The first section provides a graphical review of protein structure from primary through quaternary levels.

Protein Structure

Protein structures can be represented a number of ways.  Many times cartoon versions are used to emphasize where secondary structure elements exist.  Below the same the segment spanning amino acids Leu25 through Glu35 have been represented in one of three ways (cartoon, lines and sticks or spheres). As a cartoon drawing, only the mainchain segments are represented and all sidechain information is left out of the final representation.  This viewing option always for clarity and easily recognized secondary elements. Lines and stocks are often used for smaller section of a protein. The lines and sticks in a sense represent the bonds shared between atoms. In conjunction, each atom can be represented by a unique color. In the drawing below carbon atoms are green, nitrogen blue, oxygen red and sulfur yellow. Finally, protein structures can be represented where each atom is supplied as a sphere (volume). In this type of representation the compactness of protein can be witnessed.










 Lysozyme will be used to emphasize certain aspects of protein structure. Below is a cartoon version of lysozyme in two orientations. The cartoon drawing depicts various secondary structures colored by type where red (alpha-helices), yellow (beta-strands), and green (random coil).








Rhodopsin will be used to emphasize transmembrane domains at the end of this lab.


Primary structure (1) is encompassed by the amino acid sequence. Adjacent amino acids are linked via a peptide bond. The NCBI web site has a useful amino acid analysis interfaces under the ALL Resources (A-Z) link and then under Amino Acid Explorer.








Primary structure is typically defined as the sequence of amino acids read from amino terminus to carboxy terminus. The adjacent amino acids are covalently connected via a peptide bond formed on the ribosome by peptidyl transferase. Ribosomal RNA and not the various protein atoms with the large ribosomal subunit provide the actual peptide bond activity. Thus, the term ribozyme was coined. Primary structure of sequence is typically written in one-letter code fashion with amino acid 1 providing t the free amino terminus. Coloring in protein structures drawn in stick and line format is typically by element carbon – green, oxygen – red, nitrogen – blue, and sulfur – yellow. The user may modify these colors, but typically oxygen, nitrogen and sulfur do not change.

The primary sequence can be used to predict potential secondary structural elements. In addition, a number of primary sequences of the same protein from different sources can be aligned for identities and similarities. This type of alignment comparison is useful during the prediction of what are termed conserved domains.  More on this later.


Primary Sequence Search

There are many ways to search, import, view and analyze primary sequences from a variety of different databases. A very commonly used proteomics server is ExPASY. In the ExPASY site we could go under the databases section and find a protein sequence by name and organism. We can also search for a protein sequence using NDJINN within Biology Workbench.

ExPASY Style

  • Go to the ExPASY site and in the open box in the upper right hand corner search for lysozyme.

  • Under the Databases click on UniProtKB

  • In Query window type lysozyme

  • Click on the Fields link

  • Pulldown to Organism

  • Type in Chicken

  • Click Add & Search

  • Click on P00698

  • Under the Sequences area click on the FASTA link

  • Copy and paste just the protein sequence into a file (use Notepad Program)

  • This sequence will be used later

This web page provides a lot of information based upon the primary sequence you selected for lysozyme. The page provides name of the protein, length of sequence, organism derived, function of protein, important sites and amino acids within the protein and even a map for the secondary structural elements (more on this later).

Biology Workbench Style

  • Go to the Biology Workbench site.

  • In the Session Tools area Start New Session

  • Go into the Protein Tools area

  • Under the program area click on NDJINN

  • In the white box type in lysozyme AND chick

  • Select the SWISSPROT database to search

  • Click Search

  • Select the first sequence SWISSPROT:LYG_CHICK

  •  Import this sequence

Once you have this sequence in your Protein Tools area you can subject it to a number of other programs within Biology Workbench.


Secondary structure (2 ) forms via a repeating pattern of hydrogen bonds shared between mainchain NH and CO groups. There are two common forms of secondary structural elements, termed alpha helix and beta-sheet.

helix_cartoon.pngText Box: Alpha-helices form via a series of intrachain hydrogen bonds.  The bonding forms between the carbonyl oxygen (CO) at residue number n and the amide hydrogen (NH) at residue  n + 4.  This repeating arrangement of hydrogen bonds yields 3.6 amino acids per turn leading to a 5.4  rise per turn of alpha-helix. 
Text Box: Surface representation of lysozyme
























Alpha helices often have a sidedness to their appearance, where one side is predominantly polar and the other is clearly hydrophobic. This type of alpha-helix is termed amphipathic.















The alpha helix within lysozyme is shown as part of the complete 3D structure. The under side of the helix harbors the hydrophobic face, while the polar side projects out toward solvent.

Text Box: Beta Sheets also assemble from hydrogen bonds between carbonyl oxygen (CO) and amide hydrogen (NH) atoms.  However, the hydrogen bonds do not form a repeating periodic nature. These hydrogen bonds form across adjacent beta-strands and are thus difficult to predict from the primary sequence. 













































































Beta helix structure is another way parallel beta-strands can be utilized to build tertiary and quaternary structure. The folding of this structure proceeds progressively from the top to the bottom by wrapping parallel beta-strands in coils. This is likened to wrapping wire around a pencil. The images below reports the coil form the parallel strands provide.














Secondary Structure Prediction

Secondary structure prediction can be made using a number of different programs. These programs have been written based upon many known 3D protein structures. Protein structures have been solved and deposited into a database known as the RCSB Protein DataBank (PDB) for over 33 years.  The first 13 protein structures were solved and deposited in 1976 and as of January 1, 2010 over 62,000 protein structures have been determined and submitted to the PDB. One of the original algorithms written to predict secondary structure was designed by two scientisits Chou and Fasman. Their Chou-Fasman Method has been used successfully to predict many secondary elements from only the primary sequences.


Chou-Fasman Method

(1) Used a set of known protein structures from the RCSB PDB (known structures with locations of alpha helices, beta sheets, and random coils). In fact, they also knew which amino acids were in each type of secondary element.

(2) Assigned probability of finding an amino acid in either helix, sheet or coil (not a secondary element)

(3) Designed a scanning algorithm for prediction of secondary structure from linear amino acid sequences

Experimental Dataset

(1) Select 15 known protein structures with 2473 total amino acids (AA)

(2) Break down where these 2473 AA were located (helix, sheet, coil)

(3) Derive a normalization procedure to predict

alpha – common AA are glu, ala, leu, and his

beta – common AA are met, val, ile, cys

coil – common AA are gly, ser, pro, asn

Normalization Factor

(1) Define f = frequency of certain AA in helix, sheet or turn (# AA in secondary element/total AA)

(2) Define average frequency <f> = summation of total f for all AA in a category/20 AA (provides frequency of each AA in either helix, sheet or coil)

(3) Define protein conformational parameter for each AA as P = f/<f> (provides normalization factor for predicting whether an AA is found in helix, sheet of coil)

(4) P >1 .0 strong indication of AA to be found within that secondary element


Prediction Rules

(1) Nucleation Point - cluster of 4 helix formers (Pa > 1.0) or 3 out of 5 beta formers (Pb>1.0

(2) Helix/Beta Termination - extend in both directions until tetrapeptide hit with P < 1.0

(3) Pro not in helices nor Glu/Pro in sheets

(4) Boundaries - Pro, Asp, Glu prefer N-terminal end, His, Lys, Arg prefer C-terminal end (due to dipole of helix)

(4) Beta-sheet need 5 amino acids or longer with Pb > 1.05 and Pb > Pa for that region


Example Calculation

(1) 2473 total amino acids (AA)

(2) 890 AA in alpha-helices, 424 AA in beta-sheet 1159 AA in coils

















(3) Normalization Factor (P-value) for alanine

       Frequency of alanine in helix, sheet or coil

–228 total alanines (119 in helix, 38 in sheet, 71 in coil)

fa = 0.522 helix, fb = 0.167 sheet, fc = 0.311 coil


       Average frequencies

–<fa> helix = 890/2473 = 0.359

–<fb> sheet = 424/2473 = 0.171

            –<fc > coil = 1159/2473 = 0.469


       P value calculation

            –fa/<fa> = 0.522/0.359 = 1.45 for alanine (strong probability for helix)

            –fb /<fb > = 0.167/0.171 = 0.97 (lower probability for sheet)

            -fc /<fc > = 0.311/0.469 = 0.63 (low probability coil)

General Trends

(1) Glu, Ala, Leu strong alpha-helix formers

(2) Val, Ile, Tyr, Cys strong beta-sheet formers

(3) Gly, Pro strong coil formers/helix breakers














Manual Secondary Structure Prediction

         Assign a set of P-values to the following sequence(s)


Arg     Asn     Ala     Glu    His     Lys     His     Ala     Glu    Leu   Gly    Pro




         Predict whether this span of amino acids is more likely to be alpha helix, beta sheet of coil


Computer Based Secondary Structure Prediction

We can also use the EXPASY Proteomics Server and Biology Workbench to make predictions for secondary structure elements for an input primary sequence.


ExPASY Style

  • Go to ExPASY

  • Under the Tools area highlight Secondary Structure Prediction

  • Under this section there are many methods to predict secondary structure from the primary sequence

  • You can also go back to P00698

  •  This page has options analyze the lysozyme sequence directly

  • Under the Sequences sections use the pulldown under Tools

  • Select ProtScale and hit go

  • From here there are a number of different analysis tools

  • One is alpha-helix Chou-Fasman

  • Select and run this program


Biology Workbench Style

  • Go to the Biology Workbench site.

  • Select the lysozyme sequence

  • Run PELE

  • View the JOI (best composite prediction)

Tertiary Structure (3) consists of collapse of the secondary elements driven by hydrophobic effect. The hydrophobic effect is explained by the placement of the non-polar amino acids into the interior of the finally folded protein. This increases the entropy of water and this is thought to be the driving force during protein folding. Bonding at this

level involves sidechain or R-group interactions. Bonding at this level includes the non-covalent bonding types:  ion pairs, hydrogen bonds, and hydrophobic interactions. In addition, disulfide bonds, the second form of covalent bond may exist at this level.





































Quaternary Structure (4)
utilizes all of the boning described for tertiary structure. Again, it is driven by interactions between R-groups. However, this high order structure assembles monomeric tertiary structure into oligmers. Dimers, trimers, tetramers, pentamers, and hexamers are all commonly occuring types of oligomeric or quaternary structure. Each of these forms of quaternary structures are symmetrically arranged.

Lactate dehydrogenase (LDH) is displayed in its monomeric and tetrameric forms as colored by secondary structure. 










































Patterns, Motifs and Domains

Patterns or sites are small sections of consecutive amino acids that harbor a funtion or are a location subject to modification. Examples are phosphorylation (phosphate), glycosylation (sugar), and myristylation (fat) sites. These sites are redundant in proteins because they are defined by only a few amino acids. Thus, within a typical protein having ~200 amino acids the odds of finding a three amino acid sequence is common.


Motifs (or super-secondary structure)are built from simple arrangements of secondary elements and typically only structural. Some commonly recruited motifs are structural and include the beta-hairpin, the Greek Key, the Zinc Finger, the beta-alpha-beta motif, and the alpha helix-turn-alpha helix motif. These structural elements are stable and used to connect beta-strand elements. The beta-hairpin is used to connect adjacent anti-parallel strands, while the beta-alpha-beta motif connects parallel strands.















Some motifs are built from much smaller arrangements of amino acids that are quite far apart in the primary sequence.  For example, the trypsin family catalytic triad contains a Aspartic acid-102 , Histidine-57, and Serine-195 located within the active in close proximity. However, from the priary structure one would not predict that these three residues lie within close proximity to form a functional motif within the trypsin active site. It was only after a number of other biochemical and structural studies were conducted that allowed for these three residues to be grouped into the trypsin family catalytic triad motif.  A number of programs like PROSEARCH and PPSEARCH will search the PROSITE database for motifs and patterns.










 Domains are known as functional units within proteins. Size ranges for domains span from a low of 36 amino acids up to 692 amino acids. The majority of domains have less than 200 amino acids and the average domain harbors 100 amino acids.

 Conserved domains are functional regions within proteins that pieced together during molecular evolution. In this fasion, new proteins with different sets of functions can be generated over time leading to evolutionary changes. These types of domains appear as clusters of amino acid sequence. In fact, these unique arrangements of amino acid clusters are usd to identify the so-called conserved domain. Multiplie sequence alignments provide information regarding where plausible conserved domains lie in a protein sequence.

 A second type of domain classification is 3D domains. These domains are based upon known and conserved three-dimensional shapes of proteins. 3D domains are recruited during evolutiona as stably folded and functional units. A conserved domain may not yet have a representative 3D domain. A 3D domain prediction requires a known 3D structure be proved during the comparison. The program SMART-Simple Modular Architecture Research Tool can be used to find functional domains in primary sequences.

 The SH3 domain is a good example of both a conserved and 3D domain. SH3 domains are beta-barrel shaped and contain five to six beta-strands orietned in anti-parallel fashion. SH3 domains are often typified by a small consensus sequence –X-P-p-X-P where X = aliphatic, p = sometimes proline and P = always proline. SH3 domains are used to facilitate protein complex formation.  SH3 domains are also thought to increase the substrate seleectivity of some kinases. 









In the upper corner of the RCSB homepage the PDB Stastistics link provides some interesting information regarding the various protein structures deposited to the RCSB PDB.  One of the more interesting pieces of information is in the number of folds determined by year. As you can see by the mid-1980s there was a steady increase in the number of total protein folds. This increase reached a peak in 2007. 


Pattern, Motif and Domain Prediction

There are a number of programs available to search primary sequence for sites, motifs and domains. These programs sift through databases that have compiled sets of unique sites, motifs, and domains. These databases grew rapidly in the 1980s and 90s based upon functional and structural studies on many new proteins.


ExPASY Style

  • Go to ExPASY

  • Under the Tools & Software area click the Proteomics tools

  • Go to the Pattern and profile searches

  • Click on PPSearch

  • Select and run this program

  • Copy and paste the lysozyme sequence into the box

  • Click Yes under Include Abundant Patterns

  • The four PS links provide information on the patterns

  • Additional links are found on the pattern page

  • The PRU link provides the basic rule

  • The PDOC link provides a paragraph of information


  • Matching pattern PS00005 PKC_PHOSPHO_SITE:

  •    43: TNR

  • Total matches: 1


  • Matching pattern PS00008 MYRISTYL:

  •    26: GNWVCA

  •   102: GNGMNA

  • Total matches: 2


  • Matching pattern PS00128 LACTALBUMIN_LYSOZYME_:


  • Total matches: 1


  • Matching pattern PS00342 MICROBODIES_CTER:

  •   127: CRL

  • Total matches: 1


  • Total no of hits in this sequence: 5






















Biology WorkBench Style

  • Select the lysozyme sequence

  • Submit this sequence to PPSEARCH

  • Click on Include redundant patterns

  •  Output matches PPSEARCH from EXPasy


  • Sequence LYSC_CHICK (147 residues):


  • Matching pattern PS00005 PKC_PHOSPHO_SITE:

  •    61: TNR

  • Total matches: 1


  • Matching pattern PS00008 MYRISTYL:

  •    44: GNWVCA

  •   120: GNGMNA

  • Total matches: 2


  • Matching pattern PS00128 LACTALBUMIN_LYSOZYME_:


  • Total matches: 1


  • Matching pattern PS00342 MICROBODIES_CTER:

  •   145: CRL

  • Total matches: 1


  • Total no of hits in this sequence: 5


Entrez Style

  • Entrez can be used to search for motifs and domains

  • Click on CDD – conserved protein domain database

  • Click on search methods

  • Copy and paste the lysozyme sequence in the Protein Query Sequence window

  • Use CDD v2.18 as the Search Database

  • Hit submit

  • ::Screen shot 2010-01-11 at 12.32.29 AM.pngMouse over the various domains found on the red triangles/lines






SMART Style ::Screen shot 2010-01-09 at 2.03.39 PM.png

  • Enter the SMART site

  • Cut/paste the lysozyme sequence in to the Sequence box

  • Select the PFAM domains to search

  • Hit Sequence SMART

  • The graphic below is interactive and provides lysozyme information

::Screen shot 2010-01-09 at 2.05.34 PM.png








Pyruvate Kinase

Pattens and Motifs ExPASY Style

  • Go to ExPASY

  • Under the Tools & Software area click the Proteomics tools

  • Go to the Pattern and profile searches

  • Click on PPSearch

  • Select and run this program

  • Copy and paste the pyruvate kinase sequence into the box

  • Click Yes under Include Abundant Patterns

  • The five PS links provide information on the patterns

  • Additional links are found on the pattern page

  • The PRU link provides the basic rule

  •  The PDOC link provides a paragraph of information


  • Sequence /ebi/extserv/old-work/ppsearch-20100111-0657419760.input (530 residues):


  • Matching pattern PS00001 ASN_GLYCOSYLATION:

  •    74: NFSH

  • Total matches: 1


  • Matching pattern PS00005 PKC_PHOSPHO_SITE:

  •    40: TAR

  •    59: TLK

  •    86: TIK

  •   138: TLK

  •   204: SKK

  •   221: SEK

  •   364: TAK

  •   419: SYK

  •   433: SGR

  •   458: TAR

  •   523: TMR

  • Total matches: 11


  • Matching pattern PS00006 CK2_PHOSPHO_SITE:

  •     3: SHSE

  •    24: TFLE

  •    59: TLKE

  •    92: TATE

  •   194: TEVE

  •   221: SEKD

  •   268: SKIE

  •   340: TRAE

  • Total matches: 8


  • Matching pattern PS00008 MYRISTYL:

  •    45: GIICTI

  •    67: GMNVAR

  •   121: GLIKGS

  •   125: GSGTAE

  •   199: GGFLGS

  •   203: GSKKGV

  •   288: GIMVAR

  •   344: GSDVAN

  •   414: GSVEAS

  •   517: GSGFTN

  • Total matches: 10


  • Matching pattern PS00016 RGD:

  •   293: RGD

  • Total matches: 1


  • Matching pattern PS00110 PYRUVATE_KINASE:


  • Total matches: 1


  • Total no of hits in this sequence: 32























Domains SMART Style

         Go to the SMART site

         Submit the pyruvate kinase sequence for analysis under the PFAM domains





















Transmembrane Regions

Some proteins have a portion(s) of their amino acid sequence embedded within the lipid bilayer. These areas of the protein sequence that are embedded within a bilayer must be hydrophobic. There are bioinformatic programs that are able to predict the hydrophobicity of an amino acid sequence. Again both ExPASY and Biology Workbench have some these programs accessible.

BIOLOGY WORKBENCH contains three programs for determining regions of hydrophobicity in a protein and potential membrane spanning domains.  Enter the Biology Workbench and select your sequence under Protein Tools.  Next select one of the programs listed below.

GREASE   allows you to generate Kyte-Doolittle Hydropathy Profile.  This does not predict secondary structure, so it will detect both alpha helix and beta sheet transmembrane domains.  Numbers grater than 0 indicate increased hydrophobicity, numbers less than 0 indicate an increase in hydrophilic amino acids.

TMHMM   allows you to predict the location of transmembrane alpha helices and the location of intervening loop regions.  This program will also predict which loops between the helices will be on the inside or outside of the cell or organelle.  This program will not detect beta sheet transmembrane domains.  It takes about 20 amino acids to span a lipid bilayer in an alpha helix.  Programs can detect these transmembrane domains by looking for the presence of an alpha helix 20 amino acids long, which contain hydrophobic amino acids.

TMAP  uses a Kyte-Doolittle Hydropathy Profile to detect transmembrane spanning domains.  This does not require that the domain be an alpha helix, as in TMHMM.  It also provides the amino acid numbers for the transmembrane domain.  This is especially useful for detecting signal peptides.  A signal peptide is a short hydrophobic sequence at the amino terminus of eukaryotic proteins targeted for the endoplasmic reticulum and often for secretion.


Transmembrane Prediction

A prediction of hydrophobic regions of proteins is based upon the Hydropathy Index. Numbers greater than zero indicted hydrophobic nature, while those values less than zero indicate hydrophilicity.


         Under BIOLOGY WORKBENCH, go to protein tools

         Select lysozyme

         Run through GREASE, TMHMM, and TMAP

Grease Output

::Picture 4.png







TMAP Output


  TM  1:    6 -   28 (23)



::Picture 6.png








TMHMM Output

0_LYG_CHIC Length: 211

0_LYG_CHIC Number of predicted TMHs:  1

0_LYG_CHIC Exp number of AAs in TMHs: 20.46703

0_LYG_CHIC Exp number, first 60 AAs:  20.18892

0_LYG_CHIC Total prob of N-in:        0.45484

0_LYG_CHIC POSSIBLE N-term signal sequence

0_LYG_CHIC    TMHMM2.0  outside        1     9

0_LYG_CHIC    TMHMM2.0  TMhelix       10    32

0_LYG_CHIC    TMHMM2.0  inside        33   211


::Picture 5.png









         Under BIOLOGY WORKBENCH, go to protein tools

         Use NDJINN to locate files containing the protein sequences of rhodopsin (HSU49742)

         Use the GBPRI to only search primate sequences

         Import this sequence and run through GREASE, TMHMM, and TMAP


::Picture 2.pngGrease Output









TMAP Output


TM  1:   45 -   71 (27)

TM  2:   75 -   99 (25)

TM  3:  114 -  142 (29)

TM  4:  150 -  178 (29)

TM  5:  203 -  231 (29)

TM  6:  257 -  277 (21)

::Picture 1.pngTM  7:  284 -  304 (21)








TMHMM Output

  • 0_1236136_123613 Length: 348

  • 0_1236136_123613 Number of predicted TMHs:  7

  • 0_1236136_123613 Exp number of AAs in TMHs: 157.99471

  • 0_1236136_123613 Exp number, first 60 AAs:  21.69013

  • 0_1236136_123613 Total prob of N-in:        0.00977

  • 0_1236136_123613 POSSIBLE N-term signal sequence

  • 0_1236136_123613   TMHMM2.0  outside        1    38

  • 0_1236136_123613   TMHMM2.0  TMhelix       39    61

  • 0_1236136_123613   TMHMM2.0  inside        62    73

  • 0_1236136_123613   TMHMM2.0  TMhelix       74    96

  • 0_1236136_123613   TMHMM2.0  outside       97   110

  • 0_1236136_123613   TMHMM2.0  TMhelix      111   133

  • 0_1236136_123613   TMHMM2.0  inside       134   152

  • 0_1236136_123613   TMHMM2.0  TMhelix      153   175

  • 0_1236136_123613   TMHMM2.0  outside      176   201

  • 0_1236136_123613   TMHMM2.0  TMhelix      202   224

  • 0_1236136_123613   TMHMM2.0  inside       225   253

  • 0_1236136_123613   TMHMM2.0  TMhelix      254   276

  • 0_1236136_123613   TMHMM2.0  outside      277   285

  • 0_1236136_123613   TMHMM2.0  TMhelix      286   308

  • ::Picture 3.png0_1236136_123613   TMHMM2.0  inside       309   348










Report for Unit 4 (50 points total)

We would like a formal written report with the following information.  Don't paste in the questions, these are just to help you be organized.  You can create figures in your report by right clicking on an image, and then copy and paste it into your report.  Don't add lots of extra output, i.e. names and accession numbers from Biology Workbench, just the figure and a figure legend, and then explain what it means in your well-written, rational report.

1.  Perform a BLASTP on your assigned sequence against the PDBFINDER (sequence of the protein from a crystal structure) and SWISSPROT-HUMAN (sequence from the DNA) databases in Biology Workbench (you can select both simultaneously using the Ctrl key) or use BLASTP at the ExPASY site directly.

Include a brief one paragraph description of the protein you were assigned, i.e. what is it, what does it do, is it in any biochemical pathways, in what organs is it found, is it intra or extracellular, signal peptide, transmembrane domains etc. You can use a number of sites to identify the function of your assigned protein (PROSEARCH, PPSEARCH, SMART, GREASE, TMHMM, TMAP ExPASY UniPathway, ExPasy Enzyme etc). To help guide you use the information on patterns, motifs etc below.

Patterns and motifs

Use either PPSEARCH or PROSEARCH to find patterns and small motifs. You can use these in either ExPasy or Biology Workbench Style.


Use the SMART site to search for functional domains on your sequence.

Transmembrane Domains

Use to GREASE, TMHMM, and TMAP to predict any transmembrane domains. Use the Biology Workbench site for this.

Include output from these programs to support your discussion.

ExPASY UniPathway, ExPasy Enzyme

Use these sites under the EXPASY homepage to help identify your protein (pathway if a metabolic enzyme). ExPASY UniPathway may provide information on a pathway for your protein, while ExPASY Enzyme provides information on your enzyme and also if you use the Biochemical Pathways link you can get an image of where you enzyme functions. The Related tools and databases under the Enzyme link have other sources of information as well.

2. A PyMol image containing the 3D structure of your protein in cartoon in three forms.

A. One form with the secondary structures colored by type.

Include a comparative description of between what secondary structure prediction output predicted and what the actual 3D structure depicts in your image A.

B. A second image drawn as cartoon with at least one representative of each pattern and motif found using PPSEARCH OR PROSEARCH.  Each motif/pattern should have its own color and represent the sidechains as sticks color coordinated.

Include a description of the motifs found and highlighted in your image B.

C. A third image (if possible) that uses the SMART output to color code multiple domains independently. This may not be possible for all provided amino acid sequences. If you have a predicted domain(s) color these separately on a cartoon drawing.

Include a description of the domains found and their functional or structural importance to your protein as drawn in image C.

3.  A description of the family of proteins to which your protein belongs (paralogs, Lab 4.2).  For this portion of the unit we would like you to identify and import 6-7 related human protein sequences (not 6-7 different sequences of the same protein) and align these sequences.  Try to choose several paralogs, not just the most closely related, but don't go much below score of 100 or the alignments won't be very good.  If there is a known motif for the class of protein your protein falls into, look for this motif in your aligned sequences. 

Include a picture of the amino acid sequence alignment and the 3D alignment from Consurf (Lab 4.3).  Which regions seemed to be conserved and is this consistent with what you know about the active site or binding site of the protein?


4.  A description of the evolution of your protein in different species (orthologs, Lab 4.2).  For this portion of the unit we would like you to identify and import 6-7 protein sequences of this same protein from different species and align these sequences.  Try to choose some distantly related species for comparison, i.e. can you find this protein in yeast or bacteria?  If there is a known motif for the class of protein your protein falls into, look for this motif in your aligned sequences. 

Include a picture of the amino acid sequence alignment and the 3D alignment from Consurf (Lab 4.3).  Which regions seemed to be conserved and is this consistent with what you know about the active site or binding site of the protein? If both the paralog and ortholog alignments worked include both in your final report. If not, only include the one that seems to have worked well.


5.  A conclusion summarizing your findings.  Specifically comment on the following

Where is the most sequence similarity seen on the 3D structural alignments?  Were orthologs or paralogs more highly conserved?  Is this consistent with the relative functions of orthologs and paralogs?

Include how these bioinformatics tools have helped you to better understand the evolution and function of your assigned protein. If there were limitations to the programs, mention these as well, i.e. how accurate were the bioinformatics programs you used in predicting motifs, secondary structure, etc.


Note: When writing this report do not simply attached the output from various programs at the end. You MUST embed all output within the report and near where you are discussing its relevance. This will take some organizational work on your part. It makes no sense to talk about the patterns, motifs, and domains on page 1 of the report and have the supporting output on page 5.  I will NOT accept reports written where the images from various programs are simply added on to the end.



  2000-2010 The Board of Regents of the University of Wisconsin System.

Click here to email comments to Scott Cooper regarding this site or its links.