Lab 4.1 Proteomics
The
general outline for the last labs is to become familiar with the programs used
to analyze protein sequences and structures using some protein examples. You
will each be assigned a protein to explore individually, using the techniques we
went through in lab. There should be time during lab for you to work on your
individual projects, although some work may need to be done outside of class.
This section focuses on proteomics, which is the study of protein structure and
function using computers, databases, and protein sequences/structures. The first
section provides a graphical review of protein structure from primary through
quaternary levels.
Protein Structure
Protein structures can be represented a number of ways.
Many times cartoon versions are used to emphasize where secondary structure
elements exist. Below the same the segment spanning amino acids Leu25 through
Glu35 have been represented in one of three ways (cartoon, lines and sticks or
spheres). As a cartoon drawing, only the mainchain segments are represented and
all sidechain information is left out of the final representation. This viewing
option always for clarity and easily recognized secondary elements. Lines and
stocks are often used for smaller section of a protein. The lines and sticks in
a sense represent the bonds shared between atoms. In conjunction, each atom can
be represented by a unique color. In the drawing below carbon atoms are green,
nitrogen blue, oxygen red and sulfur yellow. Finally, protein structures can be
represented where each atom is supplied as a sphere (volume). In this type of
representation the compactness of protein can be witnessed.
Lysozyme will be used to emphasize certain aspects
of protein structure. Below is a cartoon version of lysozyme in two
orientations. The cartoon drawing depicts various secondary structures colored
by type where red (alpha-helices), yellow (beta-strands), and green (random
coil).
Rhodopsin will be used to emphasize transmembrane
domains at the end of this lab.
Primary structure (1°)
is encompassed by the amino acid sequence. Adjacent amino acids are linked via a
peptide bond. The
NCBI web site has a useful amino acid analysis interfaces under the ALL
Resources (A-Z) link and then under
Amino Acid Explorer.
Primary
structure is typically defined as the sequence of amino acids read from amino
terminus to carboxy terminus. The adjacent amino acids are covalently connected
via a peptide bond formed on the ribosome by peptidyl transferase. Ribosomal RNA
and not the various protein atoms with the large ribosomal subunit provide the
actual peptide bond activity. Thus, the term ribozyme was coined. Primary
structure of sequence is typically written in one-letter code fashion with amino
acid 1 providing t the free amino terminus. Coloring in protein structures drawn
in stick and line format is typically by element carbon – green, oxygen – red,
nitrogen – blue, and sulfur – yellow. The user may modify these colors, but
typically oxygen, nitrogen and sulfur do not change.
The primary sequence can be used to predict potential
secondary structural elements. In addition, a number of primary sequences of the
same protein from different sources can be aligned for identities and
similarities. This type of alignment comparison is useful during the prediction
of what are termed conserved domains. More on this later.
Primary Sequence Search
There are many ways to search, import, view and analyze
primary sequences from a variety of different databases. A very commonly used
proteomics server is
ExPASY. In the ExPASY site we could go under the databases section and find
a protein sequence by name and organism. We can also search for a protein
sequence using NDJINN within
Biology Workbench.
ExPASY Style
-
Go to the
ExPASY site and in the open box in the upper right hand corner
search for lysozyme.
-
Under the
Databases click on
UniProtKB
-
In
Query window type lysozyme
-
Click on
the Fields link
-
Pulldown
to Organism
-
Type in
Chicken
-
Click
Add & Search
-
Click on
P00698
-
Under the
Sequences area click on the FASTA link
-
Copy and
paste just the protein sequence into a file (use Notepad Program)
-
This
sequence will be used later
This web page provides a lot of information based upon the
primary sequence you selected for lysozyme. The page provides name of the
protein, length of sequence, organism derived, function of protein, important
sites and amino acids within the protein and even a map for the secondary
structural elements (more on this later).
Biology Workbench
Style
-
Go to the
Biology Workbench site.
-
In the
Session Tools area Start New Session
-
Go into
the Protein Tools area
-
Under the
program area click on NDJINN
-
In the
white box type in lysozyme AND chick
-
Select
the SWISSPROT database to search
-
Click
Search
-
Select
the first sequence SWISSPROT:LYG_CHICK
-
Import
this sequence
Once you have this sequence in your Protein Tools
area you can subject it to a number of other programs within Biology
Workbench.
Secondary structure (2
°)
forms via a repeating pattern of hydrogen bonds shared between mainchain NH and
CO groups. There are two common forms of secondary structural elements, termed
alpha helix and beta-sheet.
Alpha helices often have a sidedness to their appearance,
where one side is predominantly polar and the other is clearly hydrophobic. This
type of alpha-helix is termed amphipathic.
The alpha helix within lysozyme is shown as part of the
complete 3D structure. The under side of the helix harbors the hydrophobic face,
while the polar side projects out toward solvent.
Beta helix structure is another way parallel beta-strands
can be utilized to build tertiary and quaternary structure. The folding of this
structure proceeds progressively from the top to the bottom by wrapping parallel
beta-strands in coils. This is likened to wrapping wire around a pencil. The
images below reports the coil form the parallel strands provide.
Secondary Structure Prediction
Secondary structure prediction can be made using a number
of different programs. These programs have been written based upon many known 3D
protein structures. Protein structures have been solved and deposited into a
database known as the
RCSB Protein DataBank (PDB) for over 33 years. The first 13 protein
structures were solved and deposited in 1976 and as of January 1, 2010 over
62,000 protein structures have been determined and submitted to the PDB. One of
the original algorithms written to predict secondary structure was designed by
two scientisits Chou and Fasman. Their
Chou-Fasman Method has been used successfully to predict many
secondary elements from only the primary sequences.
Chou-Fasman Method
(1) Used a set of known protein structures from the
RCSB PDB (known structures with locations of alpha helices, beta sheets, and
random coils). In fact, they also knew which amino acids were in each type
of secondary element.
(2) Assigned probability of finding an amino acid in
either helix, sheet or coil (not a secondary element)
(3) Designed a scanning algorithm for prediction of
secondary structure from linear amino acid sequences
Experimental Dataset
(1) Select 15 known protein structures with 2473 total
amino acids (AA)
(2) Break down where these 2473 AA were located (helix,
sheet, coil)
(3) Derive a normalization procedure to predict
alpha – common AA are glu,
ala, leu, and his
beta – common AA are met, val,
ile, cys
coil – common AA are gly, ser,
pro, asn
Normalization Factor
(1) Define f = frequency of certain AA in helix,
sheet or turn (# AA in secondary element/total AA)
(2) Define average frequency <f> = summation of
total f for all AA in a category/20 AA (provides frequency of each AA
in either helix, sheet or coil)
(3) Define protein conformational parameter for each AA
as P = f/<f> (provides normalization factor for predicting whether an AA is
found in helix, sheet of coil)
(4) P >1 .0 strong indication of AA to be found within
that secondary element
Prediction Rules
(1) Nucleation Point - cluster of 4 helix formers (Pa
> 1.0) or 3 out of 5 beta formers (Pb>1.0
(2) Helix/Beta Termination - extend in both directions
until tetrapeptide hit with P < 1.0
(3) Pro not in helices nor Glu/Pro in sheets
(4) Boundaries - Pro, Asp, Glu prefer N-terminal end,
His, Lys, Arg prefer C-terminal end (due to dipole of helix)
(4) Beta-sheet need 5 amino acids or longer with Pb
> 1.05 and Pb > Pa
for that region
Example Calculation
(1) 2473 total amino acids (AA)
(2) 890 AA in alpha-helices, 424 AA in
beta-sheet 1159 AA in coils
(3) Normalization Factor (P-value) for alanine
§
Frequency of alanine in helix, sheet or coil
–228 total alanines (119 in helix,
38 in sheet, 71 in coil)
–fa
= 0.522 helix, fb =
0.167 sheet, fc = 0.311 coil
§
Average frequencies
–<fa>
helix = 890/2473 = 0.359
–<fb>
sheet = 424/2473 = 0.171
–<fc > coil =
1159/2473 = 0.469
§
P value calculation
–fa/<fa>
= 0.522/0.359 = 1.45 for alanine (strong probability for helix)
–fb
/<fb
> = 0.167/0.171 = 0.97 (lower probability for sheet)
-fc /<fc
> = 0.311/0.469 = 0.63 (low probability coil)
General Trends
(1) Glu, Ala, Leu strong alpha-helix formers
(2) Val, Ile, Tyr, Cys strong beta-sheet formers
(3) Gly, Pro strong coil formers/helix breakers
Manual Secondary
Structure Prediction
§
Assign a set of P-values to the following sequence(s)
Arg Asn Ala Glu
His Lys His Ala Glu Leu Gly Pro
Pa
Pb
Pc
§
Predict whether this span of amino acids is more likely to be
alpha helix, beta sheet of coil
Computer Based
Secondary Structure Prediction
We can also use the EXPASY Proteomics Server and Biology
Workbench to make predictions for secondary structure elements for an input
primary sequence.
ExPASY Style
-
Go to
ExPASY
-
Under the
Tools area highlight
Secondary Structure Prediction
-
Under
this section there are many methods to predict secondary structure from
the primary sequence
-
You can
also go back to
P00698
-
This
page has options analyze the lysozyme sequence directly
-
Under the
Sequences sections use the pulldown under Tools
-
Select
ProtScale and hit go
-
From here
there are a number of different analysis tools
-
One is
alpha-helix Chou-Fasman
-
Select and
run this program
Biology Workbench
Style
-
Go to the
Biology Workbench site.
-
Select
the lysozyme sequence
-
Run
PELE
-
View the
JOI (best composite prediction)
Tertiary Structure (3°)
consists of collapse of the secondary elements driven by hydrophobic effect. The
hydrophobic effect is explained by the placement of the non-polar amino acids
into the interior of the finally folded protein. This increases the entropy of
water and this is thought to be the driving force during protein folding.
Bonding at this
level involves sidechain or R-group interactions. Bonding
at this level includes the non-covalent bonding types: ion pairs, hydrogen
bonds, and hydrophobic interactions. In addition,
disulfide bonds, the second form of covalent bond may exist at this level.
Quaternary Structure (4°)
utilizes all of the boning described for tertiary structure. Again, it is driven
by interactions between R-groups. However, this high order structure assembles
monomeric tertiary structure into oligmers. Dimers, trimers, tetramers,
pentamers, and hexamers are all commonly occuring types of oligomeric or
quaternary structure. Each of these forms of quaternary structures are
symmetrically arranged.
Lactate dehydrogenase (LDH) is displayed in its
monomeric and tetrameric forms as colored by secondary structure.
.
Patterns,
Motifs and Domains
Patterns or sites are small sections of consecutive
amino acids that harbor a funtion or are a location subject to modification.
Examples are phosphorylation (phosphate), glycosylation (sugar), and
myristylation (fat) sites. These sites are redundant in proteins because they
are defined by only a few amino acids. Thus, within a typical protein having
~200 amino acids the odds of finding a three amino acid sequence is common.
Motifs (or super-secondary structure)are built from
simple arrangements of secondary elements and typically only structural. Some
commonly recruited motifs are structural and include the beta-hairpin, the Greek
Key, the Zinc Finger, the beta-alpha-beta motif, and the alpha helix-turn-alpha
helix motif. These structural elements are stable and used to connect
beta-strand elements. The beta-hairpin is used to connect adjacent anti-parallel
strands, while the beta-alpha-beta motif connects parallel strands.
Some motifs are built from much smaller arrangements of
amino acids that are quite far apart in the primary sequence. For example, the
trypsin family catalytic triad contains a Aspartic acid-102 , Histidine-57, and
Serine-195 located within the active in close proximity. However, from the
priary structure one would not predict that these three residues lie within
close proximity to form a functional motif within the trypsin active site. It
was only after a number of other biochemical and structural studies were
conducted that allowed for these three residues to be grouped into the trypsin
family catalytic triad motif. A number of programs like PROSEARCH and
PPSEARCH will search the PROSITE database for motifs and patterns.
Domains are known as functional units within
proteins. Size ranges for domains span from a low of 36 amino acids up to 692
amino acids. The majority of domains have less than 200 amino acids and the
average domain harbors 100 amino acids.
Conserved domains are functional regions
within proteins that pieced together during molecular evolution. In this fasion,
new proteins with different sets of functions can be generated over time leading
to evolutionary changes. These types of domains appear as clusters of amino acid
sequence. In fact, these unique arrangements of amino acid clusters are usd to
identify the so-called conserved domain. Multiplie sequence alignments provide
information regarding where plausible conserved domains lie in a protein
sequence.
A second type of domain classification is 3D domains.
These domains are based upon known and conserved three-dimensional shapes of
proteins. 3D domains are recruited during evolutiona as stably folded and
functional units. A conserved domain may not yet have a representative 3D
domain. A 3D domain prediction requires a known 3D structure be proved during
the comparison. The program
SMART-Simple Modular Architecture Research Tool can be used to
find functional domains in primary sequences.
The
SH3 domain is a good example of both a conserved and 3D domain. SH3 domains are
beta-barrel shaped and contain five to six beta-strands orietned in
anti-parallel fashion. SH3 domains are often typified by a small consensus
sequence –X-P-p-X-P where X = aliphatic, p = sometimes proline and P = always
proline. SH3 domains are used to facilitate protein complex formation. SH3
domains are also thought to increase the substrate seleectivity of some kinases.
In the upper corner of the
RCSB homepage the
PDB Stastistics link provides some interesting information regarding the
various protein structures deposited to the RCSB PDB. One of the more
interesting pieces of information is in the number of folds determined by year.
As you can see by the mid-1980s there was a steady increase in the number of
total protein folds. This increase reached a peak in 2007.
Pattern, Motif and Domain Prediction
There are a number of programs available to search primary
sequence for sites, motifs and domains. These programs sift through databases
that have compiled sets of unique sites, motifs, and domains. These databases
grew rapidly in the 1980s and 90s based upon functional and structural studies
on many new proteins.
Lysozyme
ExPASY Style
-
Go to
ExPASY
-
Under the
Tools & Software area click the Proteomics tools
-
Go to the
Pattern and profile searches
-
Click on
PPSearch
-
Select
and run this program
-
Copy and
paste the lysozyme sequence into the box
-
Click
Yes under Include Abundant Patterns
-
The four
PS links provide information on the patterns
-
Additional links are found on the pattern page
-
The
PRU link provides the basic rule
-
The PDOC
link provides a paragraph of information
-
Matching pattern
PS00005
PKC_PHOSPHO_SITE:
-
43: TNR
-
Total matches: 1
-
-
Matching pattern
PS00008
MYRISTYL:
-
26: GNWVCA
-
102: GNGMNA
-
Total matches: 2
-
-
Matching pattern
PS00128
LACTALBUMIN_LYSOZYME_:
-
76: CNIPCSALLSSDITASVNC
-
Total matches: 1
-
-
Matching pattern
PS00342
MICROBODIES_CTER:
-
127: CRL
-
Total matches: 1
-
-
Total no of hits in this sequence: 5
Biology WorkBench Style
-
Select the lysozyme sequence
-
Submit this sequence to PPSEARCH
-
Click on Include redundant patterns
-
Output
matches PPSEARCH from EXPasy
-
Sequence
LYSC_CHICK (147 residues):
-
-
Matching pattern
PS00005
PKC_PHOSPHO_SITE:
-
61: TNR
-
Total matches: 1
-
-
Matching pattern
PS00008
MYRISTYL:
-
44: GNWVCA
-
120: GNGMNA
-
Total matches: 2
-
-
Matching pattern
PS00128
LACTALBUMIN_LYSOZYME_:
-
94:
CNIPCSALLSSDITASVNC
-
Total matches: 1
-
-
Matching pattern
PS00342
MICROBODIES_CTER:
-
145: CRL
-
Total matches: 1
-
-
Total no of hits
in this sequence: 5
Entrez Style
-
Entrez can be used to search for motifs and domains
-
Click on
CDD – conserved protein domain database
-
Click on search methods
-
Copy and paste the lysozyme sequence in the Protein Query Sequence
window
-
Use CDD v2.18 as the Search Database
-
Hit submit
-
Mouse
over the various domains found on the red
triangles/lines
SMART Style
-
Enter the
SMART site
-
Cut/paste the lysozyme sequence in to the Sequence box
-
Select the PFAM domains to search
-
Hit Sequence SMART
-
The graphic below is interactive and provides lysozyme information
Pyruvate Kinase
Pattens and Motifs
ExPASY Style
-
Go to
ExPASY
-
Under the
Tools & Software area click the Proteomics tools
-
Go to the
Pattern and profile searches
-
Click on
PPSearch
-
Select
and run this program
-
Copy and
paste the pyruvate kinase sequence into the box
-
Click
Yes under Include Abundant Patterns
-
The five
PS links provide information on the patterns
-
Additional links are found on the pattern page
-
The
PRU link provides the basic rule
-
The
PDOC link provides a paragraph of information
-
Sequence /ebi/extserv/old-work/ppsearch-20100111-0657419760.input
(530 residues):
-
-
Matching pattern
PS00001
ASN_GLYCOSYLATION:
-
74: NFSH
-
Total matches: 1
-
-
Matching pattern
PS00005
PKC_PHOSPHO_SITE:
-
40: TAR
-
59: TLK
-
86: TIK
-
138: TLK
-
204: SKK
-
221: SEK
-
364: TAK
-
419: SYK
-
433: SGR
-
458: TAR
-
523: TMR
-
Total matches: 11
-
-
Matching pattern
PS00006
CK2_PHOSPHO_SITE:
-
3: SHSE
-
24: TFLE
-
59: TLKE
-
92: TATE
-
194: TEVE
-
221: SEKD
-
268: SKIE
-
340: TRAE
-
Total matches: 8
-
-
Matching pattern
PS00008
MYRISTYL:
-
45: GIICTI
-
67: GMNVAR
-
121: GLIKGS
-
125: GSGTAE
-
199: GGFLGS
-
203: GSKKGV
-
288: GIMVAR
-
344: GSDVAN
-
414: GSVEAS
-
517: GSGFTN
-
Total matches: 10
-
-
Matching pattern
PS00016
RGD:
-
293: RGD
-
Total matches: 1
-
-
Matching pattern
PS00110
PYRUVATE_KINASE:
-
264: IKIISKIENHEGV
-
Total matches: 1
-
-
Total no of hits in this sequence: 32
Domains SMART Style
§
Go to the
SMART site
§
Submit the pyruvate kinase sequence for analysis under the PFAM
domains
Transmembrane Regions
Some
proteins have a portion(s) of their amino acid sequence embedded within the
lipid bilayer. These areas of the protein sequence that are embedded within a
bilayer must be hydrophobic. There are bioinformatic programs that are able to
predict the hydrophobicity of an amino acid sequence. Again both
ExPASY and Biology Workbench have some these programs
accessible.
BIOLOGY WORKBENCH
contains three programs for determining regions of hydrophobicity in a
protein and potential membrane spanning domains. Enter the Biology Workbench
and select your sequence under Protein Tools. Next select one of the programs
listed below.
GREASE allows you to generate Kyte-Doolittle
Hydropathy Profile. This does not predict secondary structure, so it will
detect both alpha helix and beta sheet transmembrane domains. Numbers grater
than 0 indicate increased hydrophobicity, numbers less than 0 indicate an
increase in hydrophilic amino acids.
TMHMM allows you to predict the location of transmembrane alpha helices
and the location of intervening loop regions. This program will also predict
which loops between the helices will be on the inside or outside of the cell or
organelle. This program will not detect beta sheet transmembrane domains. It
takes about 20 amino acids to span a lipid bilayer in an alpha helix. Programs
can detect these transmembrane domains by looking for the presence of an alpha
helix 20 amino acids long, which contain hydrophobic amino acids.
TMAP
uses a Kyte-Doolittle Hydropathy Profile to detect transmembrane spanning
domains. This does not require that the domain be an alpha helix, as in TMHMM.
It also provides the amino acid numbers for the transmembrane domain. This is
especially useful for detecting signal peptides. A signal peptide is a short
hydrophobic sequence at the amino terminus of eukaryotic proteins targeted for
the endoplasmic reticulum and often for secretion.
Transmembrane Prediction
A
prediction of hydrophobic regions of proteins is based upon the Hydropathy
Index. Numbers greater than zero indicted hydrophobic nature, while those values
less than zero indicate hydrophilicity.
Lysozyme
§
Under
BIOLOGY WORKBENCH, go to protein
tools
§
Select lysozyme
§
Run through GREASE, TMHMM, and TMAP
Grease Output
TMAP Output
PREDICTED TRANSMEMBRANE
SEGMENTS
TM 1: 6 - 28 (23)
TMHMM Output
0_LYG_CHIC Length: 211
0_LYG_CHIC Number of
predicted TMHs: 1
0_LYG_CHIC Exp number
of AAs in TMHs: 20.46703
0_LYG_CHIC Exp number,
first 60 AAs: 20.18892
0_LYG_CHIC Total prob
of N-in: 0.45484
0_LYG_CHIC POSSIBLE
N-term signal sequence
0_LYG_CHIC
TMHMM2.0 outside 1 9
0_LYG_CHIC
TMHMM2.0 TMhelix 10 32
0_LYG_CHIC
TMHMM2.0 inside 33 211
Rhodopsin
§
Under
BIOLOGY WORKBENCH, go to protein
tools
§
Use NDJINN to locate files containing the protein sequences of
rhodopsin (HSU49742)
§
Use the GBPRI to only search primate sequences
§
Import this sequence and run through GREASE, TMHMM, and TMAP
Grease
Output
TMAP Output
PREDICTED TRANSMEMBRANE
SEGMENTS
TM 1: 45 - 71
(27)
TM 2: 75 - 99
(25)
TM 3: 114 - 142
(29)
TM 4: 150 - 178
(29)
TM 5: 203 - 231
(29)
TM 6: 257 - 277
(21)
TM
7: 284 - 304 (21)
TMHMM Output
-
0_1236136_123613
Length: 348
-
0_1236136_123613
Number of predicted TMHs: 7
-
0_1236136_123613 Exp
number of AAs in TMHs: 157.99471
-
0_1236136_123613 Exp
number, first 60 AAs: 21.69013
-
0_1236136_123613 Total
prob of N-in: 0.00977
-
0_1236136_123613
POSSIBLE N-term signal sequence
-
0_1236136_123613
TMHMM2.0 outside 1 38
-
0_1236136_123613
TMHMM2.0 TMhelix 39 61
-
0_1236136_123613
TMHMM2.0 inside 62 73
-
0_1236136_123613
TMHMM2.0 TMhelix 74 96
-
0_1236136_123613
TMHMM2.0 outside 97 110
-
0_1236136_123613
TMHMM2.0 TMhelix 111 133
-
0_1236136_123613
TMHMM2.0 inside 134 152
-
0_1236136_123613
TMHMM2.0 TMhelix 153 175
-
0_1236136_123613
TMHMM2.0 outside 176 201
-
0_1236136_123613
TMHMM2.0 TMhelix 202 224
-
0_1236136_123613
TMHMM2.0 inside 225 253
-
0_1236136_123613
TMHMM2.0 TMhelix 254 276
-
0_1236136_123613
TMHMM2.0 outside 277 285
-
0_1236136_123613
TMHMM2.0 TMhelix 286 308
-
0_1236136_123613
TMHMM2.0 inside 309 348
Report for Unit 4 (50 points total)
We
would like a formal written report with the following information. Don't paste
in the questions, these are just to help you be organized. You can create
figures in your report by right clicking on an image, and then copy and paste it
into your report. Don't add lots of extra output, i.e. names and accession
numbers from Biology Workbench, just the figure and a figure legend, and then
explain what it means in your well-written, rational report.
1.
Perform a BLASTP on your assigned sequence against the PDBFINDER
(sequence of the protein from a crystal structure) and SWISSPROT-HUMAN
(sequence from the DNA) databases in
Biology Workbench (you can select both simultaneously using the Ctrl
key) or use
BLASTP at the ExPASY site directly.
Include a brief one paragraph description of the protein you were assigned,
i.e. what is it, what does it do, is it in any biochemical pathways, in what
organs is it found, is it intra or extracellular, signal peptide,
transmembrane domains etc. You can use a number of sites to identify the
function of your assigned protein (PROSEARCH, PPSEARCH,
SMART, GREASE, TMHMM, TMAP ExPASY UniPathway, ExPasy Enzyme etc).
To help guide you use the information on patterns, motifs etc below.
Patterns and motifs
Use
either
PPSEARCH or
PROSEARCH to find patterns and small motifs. You can use these in
either ExPasy or Biology Workbench Style.
Domains
Use
the
SMART site to search for functional domains on your sequence.
Transmembrane Domains
Use
to GREASE, TMHMM, and TMAP to predict any transmembrane
domains. Use the Biology Workbench site for this.
Include output from these programs to support your discussion.
ExPASY UniPathway, ExPasy Enzyme
Use
these sites under the EXPASY homepage to help identify your protein
(pathway if a metabolic enzyme). ExPASY UniPathway may provide information
on a pathway for your protein, while ExPASY Enzyme provides information on
your enzyme and also if you use the
Biochemical Pathways link you can get an image of where you
enzyme functions. The Related tools and databases under the Enzyme
link have other sources of information as well.
2.
A PyMol image containing the 3D structure of your protein in
cartoon in three forms.
A. One form with the secondary structures colored by type.
Include a comparative description of between what secondary structure
prediction output predicted and what the actual 3D structure depicts in your
image A.
B. A second image drawn as cartoon with at least one representative of
each pattern and motif found using PPSEARCH OR PROSEARCH.
Each motif/pattern should have its own color and represent the sidechains as
sticks color coordinated.
Include a description of the motifs found and highlighted in your image B.
C. A third image (if possible) that uses the SMART output to color code
multiple domains independently. This may not be possible for all provided
amino acid sequences. If you have a predicted domain(s) color these
separately on a cartoon drawing.
Include a description of the domains found and their functional or
structural importance to your protein as drawn in image C.
3. A description of
the family of proteins to which your protein belongs (paralogs, Lab 4.2). For
this portion of the unit we would like you to identify and import 6-7 related
human protein sequences (not 6-7 different sequences of the same protein) and
align these sequences. Try to choose several paralogs, not just the most
closely related, but don't go much below score of 100 or the alignments won't be
very good. If there is a known motif for the class of protein your protein
falls into, look for this motif in your aligned sequences.
Include a picture of
the amino acid sequence alignment and the 3D alignment from Consurf (Lab
4.3). Which regions seemed to be conserved and is this consistent with what
you know about the active site or binding site of the protein?
4. A description of
the evolution of your protein in different species (orthologs, Lab 4.2). For
this portion of the unit we would like you to identify and import 6-7 protein
sequences of this same protein from different species and align these
sequences. Try to choose some distantly related species for comparison, i.e.
can you find this protein in yeast or bacteria? If there is a known motif for
the class of protein your protein falls into, look for this motif in your
aligned sequences.
Include a picture of
the amino acid sequence alignment and the 3D alignment from Consurf (Lab
4.3). Which regions seemed to be conserved and is this consistent with what
you know about the active site or binding site of the protein? If both the
paralog and ortholog alignments worked include both in your final report. If
not, only include the one that seems to have worked well.
5. A conclusion
summarizing your findings. Specifically comment on the following
Where is the most sequence
similarity seen on the 3D structural alignments? Were orthologs or paralogs
more highly conserved? Is this consistent with the relative functions of
orthologs and paralogs?
Include how these
bioinformatics tools have helped you to better understand the evolution and
function of your assigned protein. If there were limitations to the
programs, mention these as well, i.e. how accurate were the bioinformatics
programs you used in predicting motifs, secondary structure, etc.
Note: When writing this report do not simply attached the output from
various programs at the end. You MUST embed all output within the report
and near where you are discussing its relevance. This will take some
organizational work on your part. It makes no sense to talk about the patterns,
motifs, and domains on page 1 of the report and have the supporting output on
page 5. I will NOT accept reports written where the images from various
programs are simply added on to the end.
|