Lab 4.1

Current Level

Previous Level

Lab 4.1 Proteomics

The general outline for the last labs is to become familiar with the programs used to analyze protein sequences and structures using some protein examples. You will each be assigned a protein to explore individually, using the techniques we went through in lab. There should be time during lab for you to work on your individual projects, although some work may need to be done outside of class. This section focuses on proteomics, which is the study of protein structure and function using computers, databases, and protein sequences/structures. The first section provides a graphical review of protein structure from primary through quaternary levels.

Protein Structure

Protein structures can be represented a number of ways. Many times cartoon versions are used to emphasize where secondary structure elements exist. Below the same the segment spanning amino acids Leu25 through Glu35 have been represented in one of three ways (cartoon, lines and sticks or spheres). As a cartoon drawing, only the mainchain segments are represented and all sidechain information is left out of the final representation. This viewing option always for clarity and easily recognized secondary elements. Lines and stocks are often used for smaller section of a protein. The lines and sticks in a sense represent the bonds shared between atoms. In conjunction, each atom can be represented by a unique color. In the drawing below carbon atoms are green, nitrogen blue, oxygen red and sulfur yellow. Finally, protein structures can be represented where each atom is supplied as a sphere (volume). In this type of representation the compactness of protein can be witnessed.

Lysozyme will be used to emphasize certain aspects of protein structure. Below is a cartoon version of lysozyme in two orientations. The cartoon drawing depicts various secondary structures colored by type where red (alpha-helices), yellow (beta-strands), and green (random coil).

Rhodopsin will be used to emphasize transmembrane domains at the end of this lab.

Primary structure (1°) is encompassed by the amino acid sequence. Adjacent amino acids are linked via a peptide bond. The NCBI web site has a useful amino acid analysis interfaces under the ALL Resources (A-Z) link and then under Amino Acid Explorer.

Primary structure is typically defined as the sequence of amino acids read from amino terminus to carboxy terminus. The adjacent amino acids are covalently connected via a peptide bond formed on the ribosome by peptidyl transferase. Ribosomal RNA and not the various protein atoms with the large ribosomal subunit provide the actual peptide bond activity. Thus, the term ribozyme was coined. Primary structure of sequence is typically written in one-letter code fashion with amino acid 1 providing t the free amino terminus. Coloring in protein structures drawn in stick and line format is typically by element carbon – green, oxygen – red, nitrogen – blue, and sulfur – yellow. The user may modify these colors, but typically oxygen, nitrogen and sulfur do not change.

The primary sequence can be used to predict potential secondary structural elements. In addition, a number of primary sequences of the same protein from different sources can be aligned for identities and similarities. This type of alignment comparison is useful during the prediction of what are termed conserved domains. More on this later.

Primary Sequence Search

There are many ways to search, import, view and analyze primary sequences from a variety of different databases. A very commonly used proteomics server is ExPASY. In the ExPASY site we could go under the databases section and find a protein sequence by name and organism. We can also search for a protein sequence using NDJINN within Biology Workbench.

ExPASY Style

Go to the ExPASY site and in the open box in the upper right hand corner search for lysozyme.

Under the Databases click on UniProtKB

In Query window type lysozyme

Click on the Fields link

Pulldown to Organism

Type in Chicken

Click Add & Search

Click on P00698

Under the Sequences area click on the FASTA link

Copy and paste just the protein sequence into a file (use Notepad Program)

This sequence will be used later

This web page provides a lot of information based upon the primary sequence you selected for lysozyme. The page provides name of the protein, length of sequence, organism derived, function of protein, important sites and amino acids within the protein and even a map for the secondary structural elements (more on this later).

Biology Workbench Style

Go to the Biology Workbench site.

In the Session Tools area Start New Session

Go into the Protein Tools area

Under the program area click on NDJINN

In the white box type in lysozyme AND chick

Select the SWISSPROT database to search

Click Search

Select the first sequence SWISSPROT:LYG_CHICK

Import this sequence

Once you have this sequence in your Protein Tools area you can subject it to a number of other programs within Biology Workbench.

Secondary structure (2 °) forms via a repeating pattern of hydrogen bonds shared between mainchain NH and CO groups. There are two common forms of secondary structural elements, termed alpha helix and beta-sheet.

Text Box: Alpha-helices form via a series of intrachain hydrogen bonds. The bonding forms between the carbonyl oxygen (CO) at residue number n and the amide hydrogen (NH) at residue n + 4. This repeating arrangement of hydrogen bonds yields 3.6 amino acids per turn leading to a 5.4 Å rise per turn of alpha-helix.

Alpha helices often have a sidedness to their appearance, where one side is predominantly polar and the other is clearly hydrophobic. This type of alpha-helix is termed amphipathic.

The alpha helix within lysozyme is shown as part of the complete 3D structure. The under side of the helix harbors the hydrophobic face, while the polar side projects out toward solvent.

Text Box: Beta Sheets also assemble from hydrogen bonds between carbonyl oxygen (CO) and amide hydrogen (NH) atoms. However, the hydrogen bonds do not form a repeating periodic nature. These hydrogen bonds form across adjacent beta-strands and are thus difficult to predict from the primary sequence.

:paralle_strands.png

Beta helix structure is another way parallel beta-strands can be utilized to build tertiary and quaternary structure. The folding of this structure proceeds progressively from the top to the bottom by wrapping parallel beta-strands in coils. This is likened to wrapping wire around a pencil. The images below reports the coil form the parallel strands provide.

Secondary Structure Prediction

Secondary structure prediction can be made using a number of different programs. These programs have been written based upon many known 3D protein structures. Protein structures have been solved and deposited into a database known as the RCSB Protein DataBank (PDB) for over 33 years. The first 13 protein structures were solved and deposited in 1976 and as of January 1, 2010 over 62,000 protein structures have been determined and submitted to the PDB. One of the original algorithms written to predict secondary structure was designed by two scientisits Chou and Fasman. Their Chou-Fasman Method has been used successfully to predict many secondary elements from only the primary sequences.

Chou-Fasman Method

(1) Used a set of known protein structures from the RCSB PDB (known structures with locations of alpha helices, beta sheets, and random coils). In fact, they also knew which amino acids were in each type of secondary element.

(2) Assigned probability of finding an amino acid in either helix, sheet or coil (not a secondary element)

(3) Designed a scanning algorithm for prediction of secondary structure from linear amino acid sequences

Experimental Dataset

(1) Select 15 known protein structures with 2473 total amino acids (AA)

(2) Break down where these 2473 AA were located (helix, sheet, coil)

(3) Derive a normalization procedure to predict

alpha – common AA are glu, ala, leu, and his

beta – common AA are met, val, ile, cys

coil – common AA are gly, ser, pro, asn

Normalization Factor

(1) Define f = frequency of certain AA in helix, sheet or turn (# AA in secondary element/total AA)

(2) Define average frequency <f> = summation of total f for all AA in a category/20 AA (provides frequency of each AA in either helix, sheet or coil)

(3) Define protein conformational parameter for each AA as P = f/<f> (provides normalization factor for predicting whether an AA is found in helix, sheet of coil)

(4) P >1 .0 strong indication of AA to be found within that secondary element

Prediction Rules

(1) Nucleation Point - cluster of 4 helix formers (P_a > 1.0) or 3 out of 5 beta formers (P_b>1.0

(2) Helix/Beta Termination - extend in both directions until tetrapeptide hit with P < 1.0

(3) Pro not in helices nor Glu/Pro in sheets

(4) Boundaries - Pro, Asp, Glu prefer N-terminal end, His, Lys, Arg prefer C-terminal end (due to dipole of helix)

(4) Beta-sheet need 5 amino acids or longer with P_b > 1.05 and P_b > P_a for that region

Example Calculation

(1) 2473 total amino acids (AA)

(2) 890 AA in alpha-helices, 424 AA in beta-sheet 1159 AA in coils

(3) Normalization Factor (P-value) for alanine

§ Frequency of alanine in helix, sheet or coil

–228 total alanines (119 in helix, 38 in sheet, 71 in coil)

–f_a = 0.522 helix, f_b = 0.167 sheet, f_c= 0.311 coil

§ Average frequencies

–<f_a> helix = 890/2473 = 0.359

–<f_b> sheet = 424/2473 = 0.171

–<f_c> coil = 1159/2473 = 0.469

§ P value calculation

–f_a/<f_a> = 0.522/0.359 = 1.45 for alanine (strong probability for helix)

–f_b/<f_b > = 0.167/0.171 = 0.97 (lower probability for sheet)

-f_c/<f_c> = 0.311/0.469 = 0.63 (low probability coil)

General Trends

(1) Glu, Ala, Leu strong alpha-helix formers

(2) Val, Ile, Tyr, Cys strong beta-sheet formers

(3) Gly, Pro strong coil formers/helix breakers

Manual Secondary Structure Prediction

§ Assign a set of P-values to the following sequence(s)

Arg Asn Ala Glu His Lys His Ala Glu Leu Gly Pro

P_a

P_b

P_c

§ Predict whether this span of amino acids is more likely to be alpha helix, beta sheet of coil

Computer Based Secondary Structure Prediction

We can also use the EXPASY Proteomics Server and Biology Workbench to make predictions for secondary structure elements for an input primary sequence.

ExPASY Style

Go to ExPASY

Under the Tools area highlight Secondary Structure Prediction

Under this section there are many methods to predict secondary structure from the primary sequence

You can also go back to P00698

This page has options analyze the lysozyme sequence directly

Under the Sequences sections use the pulldown under Tools

Select ProtScale and hit go

From here there are a number of different analysis tools

One is alpha-helix Chou-Fasman

Select and run this program

Biology Workbench Style

Go to the Biology Workbench site.

Select the lysozyme sequence

Run PELE

View the JOI (best composite prediction)

Tertiary Structure (3°) consists of collapse of the secondary elements driven by hydrophobic effect. The hydrophobic effect is explained by the placement of the non-polar amino acids into the interior of the finally folded protein. This increases the entropy of water and this is thought to be the driving force during protein folding. Bonding at this

level involves sidechain or R-group interactions. Bonding at this level includes the non-covalent bonding types: ion pairs, hydrogen bonds, and hydrophobic interactions. In addition, disulfide bonds, the second form of covalent bond may exist at this level.

Quaternary Structure (4°) utilizes all of the boning described for tertiary structure. Again, it is driven by interactions between R-groups. However, this high order structure assembles monomeric tertiary structure into oligmers. Dimers, trimers, tetramers, pentamers, and hexamers are all commonly occuring types of oligomeric or quaternary structure. Each of these forms of quaternary structures are symmetrically arranged.

Lactate dehydrogenase (LDH) is displayed in its monomeric and tetrameric forms as colored by secondary structure.

Patterns, Motifs and Domains

Patterns or sites are small sections of consecutive amino acids that harbor a funtion or are a location subject to modification. Examples are phosphorylation (phosphate), glycosylation (sugar), and myristylation (fat) sites. These sites are redundant in proteins because they are defined by only a few amino acids. Thus, within a typical protein having ~200 amino acids the odds of finding a three amino acid sequence is common.

Motifs (or super-secondary structure)are built from simple arrangements of secondary elements and typically only structural. Some commonly recruited motifs are structural and include the beta-hairpin, the Greek Key, the Zinc Finger, the beta-alpha-beta motif, and the alpha helix-turn-alpha helix motif. These structural elements are stable and used to connect beta-strand elements. The beta-hairpin is used to connect adjacent anti-parallel strands, while the beta-alpha-beta motif connects parallel strands.

Some motifs are built from much smaller arrangements of amino acids that are quite far apart in the primary sequence. For example, the trypsin family catalytic triad contains a Aspartic acid-102 , Histidine-57, and Serine-195 located within the active in close proximity. However, from the priary structure one would not predict that these three residues lie within close proximity to form a functional motif within the trypsin active site. It was only after a number of other biochemical and structural studies were conducted that allowed for these three residues to be grouped into the trypsin family catalytic triad motif. A number of programs like PROSEARCH and PPSEARCH will search the PROSITE database for motifs and patterns.

Domains are known as functional units within proteins. Size ranges for domains span from a low of 36 amino acids up to 692 amino acids. The majority of domains have less than 200 amino acids and the average domain harbors 100 amino acids.

Conserved domains are functional regions within proteins that pieced together during molecular evolution. In this fasion, new proteins with different sets of functions can be generated over time leading to evolutionary changes. These types of domains appear as clusters of amino acid sequence. In fact, these unique arrangements of amino acid clusters are usd to identify the so-called conserved domain. Multiplie sequence alignments provide information regarding where plausible conserved domains lie in a protein sequence.

A second type of domain classification is 3D domains. These domains are based upon known and conserved three-dimensional shapes of proteins. 3D domains are recruited during evolutiona as stably folded and functional units. A conserved domain may not yet have a representative 3D domain. A 3D domain prediction requires a known 3D structure be proved during the comparison. The program SMART-Simple Modular Architecture Research Tool can be used to find functional domains in primary sequences.

The SH3 domain is a good example of both a conserved and 3D domain. SH3 domains are beta-barrel shaped and contain five to six beta-strands orietned in anti-parallel fashion. SH3 domains are often typified by a small consensus sequence –X-P-p-X-P where X = aliphatic, p = sometimes proline and P = always proline. SH3 domains are used to facilitate protein complex formation. SH3 domains are also thought to increase the substrate seleectivity of some kinases.

In the upper corner of the RCSB homepage the PDB Stastistics link provides some interesting information regarding the various protein structures deposited to the RCSB PDB. One of the more interesting pieces of information is in the number of folds determined by year. As you can see by the mid-1980s there was a steady increase in the number of total protein folds. This increase reached a peak in 2007.

Pattern, Motif and Domain Prediction

There are a number of programs available to search primary sequence for sites, motifs and domains. These programs sift through databases that have compiled sets of unique sites, motifs, and domains. These databases grew rapidly in the 1980s and 90s based upon functional and structural studies on many new proteins.

Lysozyme

ExPASY Style

Go to ExPASY

Under the Tools & Software area click the Proteomics tools

Go to the Pattern and profile searches

Click on PPSearch

Select and run this program

Copy and paste the lysozyme sequence into the box

Click Yes under Include Abundant Patterns

The four PS links provide information on the patterns

Additional links are found on the pattern page

The PRU link provides the basic rule

The PDOC link provides a paragraph of information

Matching pattern PS00005 PKC_PHOSPHO_SITE:
43: TNR
Total matches: 1
Matching pattern PS00008 MYRISTYL:
26: GNWVCA
102: GNGMNA
Total matches: 2
Matching pattern PS00128 LACTALBUMIN_LYSOZYME_:
76: CNIPCSALLSSDITASVNC
Total matches: 1
Matching pattern PS00342 MICROBODIES_CTER:
127: CRL
Total matches: 1
Total no of hits in this sequence: 5

Biology WorkBench Style

Select the lysozyme sequence

Submit this sequence to PPSEARCH

Click on Include redundant patterns

Output matches PPSEARCH from EXPasy

Sequence LYSC_CHICK (147 residues):
Matching pattern PS00005 PKC_PHOSPHO_SITE:
61: TNR
Total matches: 1
Matching pattern PS00008 MYRISTYL:
44: GNWVCA
120: GNGMNA
Total matches: 2
Matching pattern PS00128 LACTALBUMIN_LYSOZYME_:
94: CNIPCSALLSSDITASVNC
Total matches: 1
Matching pattern PS00342 MICROBODIES_CTER:
145: CRL
Total matches: 1
Total no of hits in this sequence: 5

Entrez Style

Entrez can be used to search for motifs and domains

Click on CDD – conserved protein domain database

Click on search methods

Copy and paste the lysozyme sequence in the Protein Query Sequence window

Use CDD v2.18 as the Search Database

Hit submit

Mouse over the various domains found on the red triangles/lines

SMART Style

Enter the SMART site
Cut/paste the lysozyme sequence in to the Sequence box
Select the PFAM domains to search
Hit Sequence SMART
The graphic below is interactive and provides lysozyme information

::Screen shot 2010-01-09 at 2.05.34 PM.png

Pyruvate Kinase

Pattens and Motifs ExPASY Style

Go to ExPASY

Under the Tools & Software area click the Proteomics tools

Go to the Pattern and profile searches

Click on PPSearch

Select and run this program

Copy and paste the pyruvate kinase sequence into the box

Click Yes under Include Abundant Patterns

The five PS links provide information on the patterns

Additional links are found on the pattern page

The PRU link provides the basic rule

The PDOC link provides a paragraph of information

Sequence /ebi/extserv/old-work/ppsearch-20100111-0657419760.input (530 residues):
Matching pattern PS00001 ASN_GLYCOSYLATION:
74: NFSH
Total matches: 1
Matching pattern PS00005 PKC_PHOSPHO_SITE:
40: TAR
59: TLK
86: TIK
138: TLK
204: SKK
221: SEK
364: TAK
419: SYK
433: SGR
458: TAR
523: TMR
Total matches: 11
Matching pattern PS00006 CK2_PHOSPHO_SITE:
3: SHSE
24: TFLE
59: TLKE
92: TATE
194: TEVE
221: SEKD
268: SKIE
340: TRAE
Total matches: 8
Matching pattern PS00008 MYRISTYL:
45: GIICTI
67: GMNVAR
121: GLIKGS
125: GSGTAE
199: GGFLGS
203: GSKKGV
288: GIMVAR
344: GSDVAN
414: GSVEAS
517: GSGFTN
Total matches: 10
Matching pattern PS00016 RGD:
293: RGD
Total matches: 1
Matching pattern PS00110 PYRUVATE_KINASE:
264: IKIISKIENHEGV
Total matches: 1
Total no of hits in this sequence: 32

Domains SMART Style

§ Go to the SMART site

§ Submit the pyruvate kinase sequence for analysis under the PFAM domains

Transmembrane Regions

Some proteins have a portion(s) of their amino acid sequence embedded within the lipid bilayer. These areas of the protein sequence that are embedded within a bilayer must be hydrophobic. There are bioinformatic programs that are able to predict the hydrophobicity of an amino acid sequence. Again both ExPASY and Biology Workbench have some these programs accessible.

BIOLOGY WORKBENCH contains three programs for determining regions of hydrophobicity in a protein and potential membrane spanning domains. Enter the Biology Workbench and select your sequence under Protein Tools. Next select one of the programs listed below.

GREASE allows you to generate Kyte-Doolittle Hydropathy Profile. This does not predict secondary structure, so it will detect both alpha helix and beta sheet transmembrane domains. Numbers grater than 0 indicate increased hydrophobicity, numbers less than 0 indicate an increase in hydrophilic amino acids.

TMHMM allows you to predict the location of transmembrane alpha helices and the location of intervening loop regions. This program will also predict which loops between the helices will be on the inside or outside of the cell or organelle. This program will not detect beta sheet transmembrane domains. It takes about 20 amino acids to span a lipid bilayer in an alpha helix. Programs can detect these transmembrane domains by looking for the presence of an alpha helix 20 amino acids long, which contain hydrophobic amino acids.

TMAP uses a Kyte-Doolittle Hydropathy Profile to detect transmembrane spanning domains. This does not require that the domain be an alpha helix, as in TMHMM. It also provides the amino acid numbers for the transmembrane domain. This is especially useful for detecting signal peptides. A signal peptide is a short hydrophobic sequence at the amino terminus of eukaryotic proteins targeted for the endoplasmic reticulum and often for secretion.

Transmembrane Prediction

A prediction of hydrophobic regions of proteins is based upon the Hydropathy Index. Numbers greater than zero indicted hydrophobic nature, while those values less than zero indicate hydrophilicity.

Lysozyme

§ Under BIOLOGY WORKBENCH, go to protein tools

§ Select lysozyme

§ Run through GREASE, TMHMM, and TMAP

Grease Output

::Picture 4.png

TMAP Output

PREDICTED TRANSMEMBRANE SEGMENTS

TM 1: 6 - 28 (23)

::Picture 6.png

TMHMM Output

0_LYG_CHIC Length: 211

0_LYG_CHIC Number of predicted TMHs: 1

0_LYG_CHIC Exp number of AAs in TMHs: 20.46703

0_LYG_CHIC Exp number, first 60 AAs: 20.18892

0_LYG_CHIC Total prob of N-in: 0.45484

0_LYG_CHIC POSSIBLE N-term signal sequence

0_LYG_CHIC TMHMM2.0 outside 1 9

0_LYG_CHIC TMHMM2.0 TMhelix 10 32

0_LYG_CHIC TMHMM2.0 inside 33 211

::Picture 5.png

Rhodopsin

§ Under BIOLOGY WORKBENCH, go to protein tools

§ Use NDJINN to locate files containing the protein sequences of rhodopsin (HSU49742)

§ Use the GBPRI to only search primate sequences

§ Import this sequence and run through GREASE, TMHMM, and TMAP

::Picture 2.png Grease Output

TMAP Output

PREDICTED TRANSMEMBRANE SEGMENTS

TM 1: 45 - 71 (27)

TM 2: 75 - 99 (25)

TM 3: 114 - 142 (29)

TM 4: 150 - 178 (29)

TM 5: 203 - 231 (29)

TM 6: 257 - 277 (21)

::Picture 1.png TM 7: 284 - 304 (21)

TMHMM Output

0_1236136_123613 Length: 348
0_1236136_123613 Number of predicted TMHs: 7
0_1236136_123613 Exp number of AAs in TMHs: 157.99471
0_1236136_123613 Exp number, first 60 AAs: 21.69013
0_1236136_123613 Total prob of N-in: 0.00977
0_1236136_123613 POSSIBLE N-term signal sequence
0_1236136_123613 TMHMM2.0 outside 1 38
0_1236136_123613 TMHMM2.0 TMhelix 39 61
0_1236136_123613 TMHMM2.0 inside 62 73
0_1236136_123613 TMHMM2.0 TMhelix 74 96
0_1236136_123613 TMHMM2.0 outside 97 110
0_1236136_123613 TMHMM2.0 TMhelix 111 133
0_1236136_123613 TMHMM2.0 inside 134 152
0_1236136_123613 TMHMM2.0 TMhelix 153 175
0_1236136_123613 TMHMM2.0 outside 176 201
0_1236136_123613 TMHMM2.0 TMhelix 202 224
0_1236136_123613 TMHMM2.0 inside 225 253
0_1236136_123613 TMHMM2.0 TMhelix 254 276
0_1236136_123613 TMHMM2.0 outside 277 285
0_1236136_123613 TMHMM2.0 TMhelix 286 308
0_1236136_123613 TMHMM2.0 inside 309 348

Report for Unit 4 (50 points total)

We would like a formal written report with the following information. Don't paste in the questions, these are just to help you be organized. You can create figures in your report by right clicking on an image, and then copy and paste it into your report. Don't add lots of extra output, i.e. names and accession numbers from Biology Workbench, just the figure and a figure legend, and then explain what it means in your well-written, rational report.

1. Perform a BLASTP on your assigned sequence against the PDBFINDER (sequence of the protein from a crystal structure) and SWISSPROT-HUMAN (sequence from the DNA) databases in Biology Workbench (you can select both simultaneously using the Ctrl key) or use BLASTP at the ExPASY site directly.

Include a brief one paragraph description of the protein you were assigned, i.e. what is it, what does it do, is it in any biochemical pathways, in what organs is it found, is it intra or extracellular, signal peptide, transmembrane domains etc. You can use a number of sites to identify the function of your assigned protein (PROSEARCH, PPSEARCH, SMART, GREASE, TMHMM, TMAP ExPASY UniPathway, ExPasy Enzyme etc). To help guide you use the information on patterns, motifs etc below.

Patterns and motifs

Use either PPSEARCH or PROSEARCH to find patterns and small motifs. You can use these in either ExPasy or Biology Workbench Style.

Domains

Use the SMART site to search for functional domains on your sequence.

Transmembrane Domains

Use to GREASE, TMHMM, and TMAP to predict any transmembrane domains. Use the Biology Workbench site for this.

Include output from these programs to support your discussion.

ExPASY UniPathway, ExPasy Enzyme

Use these sites under the EXPASY homepage to help identify your protein (pathway if a metabolic enzyme). ExPASY UniPathway may provide information on a pathway for your protein, while ExPASY Enzyme provides information on your enzyme and also if you use the Biochemical Pathways link you can get an image of where you enzyme functions. The Related tools and databases under the Enzyme link have other sources of information as well.

2. A PyMol image containing the 3D structure of your protein in cartoon in three forms.

A. One form with the secondary structures colored by type.

Include a comparative description of between what secondary structure prediction output predicted and what the actual 3D structure depicts in your image A.

B. A second image drawn as cartoon with at least one representative of each pattern and motif found using PPSEARCH OR PROSEARCH. Each motif/pattern should have its own color and represent the sidechains as sticks color coordinated.

Include a description of the motifs found and highlighted in your image B.

C. A third image (if possible) that uses the SMART output to color code multiple domains independently. This may not be possible for all provided amino acid sequences. If you have a predicted domain(s) color these separately on a cartoon drawing.

Include a description of the domains found and their functional or structural importance to your protein as drawn in image C.

3. A description of the family of proteins to which your protein belongs (paralogs, Lab 4.2). For this portion of the unit we would like you to identify and import 6-7 related human protein sequences (not 6-7 different sequences of the same protein) and align these sequences. Try to choose several paralogs, not just the most closely related, but don't go much below score of 100 or the alignments won't be very good. If there is a known motif for the class of protein your protein falls into, look for this motif in your aligned sequences.

Include a picture of the amino acid sequence alignment and the 3D alignment from Consurf (Lab 4.3). Which regions seemed to be conserved and is this consistent with what you know about the active site or binding site of the protein?

4. A description of the evolution of your protein in different species (orthologs, Lab 4.2). For this portion of the unit we would like you to identify and import 6-7 protein sequences of this same protein from different species and align these sequences. Try to choose some distantly related species for comparison, i.e. can you find this protein in yeast or bacteria? If there is a known motif for the class of protein your protein falls into, look for this motif in your aligned sequences.

Include a picture of the amino acid sequence alignment and the 3D alignment from Consurf (Lab 4.3). Which regions seemed to be conserved and is this consistent with what you know about the active site or binding site of the protein? If both the paralog and ortholog alignments worked include both in your final report. If not, only include the one that seems to have worked well.

5. A conclusion summarizing your findings. Specifically comment on the following

Where is the most sequence similarity seen on the 3D structural alignments? Were orthologs or paralogs more highly conserved? Is this consistent with the relative functions of orthologs and paralogs?

Include how these bioinformatics tools have helped you to better understand the evolution and function of your assigned protein. If there were limitations to the programs, mention these as well, i.e. how accurate were the bioinformatics programs you used in predicting motifs, secondary structure, etc.

Note: When writing this report do not simply attached the output from various programs at the end. You MUST embed all output within the report and near where you are discussing its relevance. This will take some organizational work on your part. It makes no sense to talk about the patterns, motifs, and domains on page 1 of the report and have the supporting output on page 5. I will NOT accept reports written where the images from various programs are simply added on to the end.

Click here to email comments to Scott Cooper regarding this site or its links.