examination of protein-protein interaction

For this lab report, you need to articulate the details and outcome of in silico experiments
involving the most informative results for your assigned gene from 1) the analysis of
protein domains; 2) examination of protein-protein interaction (PPI) data with a GO
enrichment analysis using AgriGO as described below; 3) identification of interesting
gene expression patterns and using this information to help guide function prediction; 4)
performing a coexpression analysis against an expression database of your choice and
subsequently performing a GO enrichment analysis using AgriGO; 5) identifying any
known cis-elements in the promoters of these genes; 6) using integrative tools like
GeneMANIA (genemania.org; Warde-Farley et al., 2010, Nucl. Acids Res.) or AraNet
(www.functionalnet.org/; Lee et al., 2010, Nature Biotechnol.). Your report will be 6
pages of Times Roman 12 point text or Arial 11 point text, double-spaced, not including
references and figures, of which there is a requirement of exactly 3 (of your choosing,
with informative legends).
Have fun, but do think scientifically and logically about your questions and approaches.
And do, of course, visit PubMed, browse some abstracts in an area of interest, even look
at a couple papers, before formulating your function hypothesis. You should cite at least
2 primary sources plus all tools/websites/data sources used, using the journal Nature’s
reference format.
Introduction (6 marks) ~ ¾ page
Introduce your gene of interest and state 1) What question you are asking and why?
This could be as simple as: “What is the function of this protein?”, if its function is
unknown. Or this could be a more expressed desire to characterize the function via the
examination of protein domains and protein-protein interaction data. 2) How did you
attempt to answer your question? You could also mention findings from your first lab
report here.
Methods (8 marks) ~ ¾ page
A succinct description of your analysis with enough information for someone to
reproduce your result. It’s enough to say “I retrieved all interactions for my gene of
interest from Arabidopsis PPI data in BioGRID, filtering by a minimum evidence level of
3 and minimum interaction level of 2”, for instance. Be sure to cite actual, published
papers for web sources of data and tools.
Results and Discussion (16 marks) ~ 3.5 pages for Results / ~ 1 page for Discussion
State the results of the experiment and their significance. Focus your “story” around your
3 figures but do mention your other analyses. What results did you find? What can you
infer or conclude from your data: what might your gene do, in other words? Negative
results can also be informative. State what you learned and what you might ask/check
next if given the time and interest. Attempt to provide a hypothesis for in vivo activity
and propose a follow up experiment, be it in silico or in vivo, to test this hypothesis.

Potential Data sources and Tools*
Domain Analysis
Use tools described in Class 7, e.g. InterProScan.
Protein-Protein Interactions
Arabidopsis Interactions Viewer: http://bar.utoronto.ca/interactions/
BioGRID (as per Class 8): http://thebiogrid.org/
Note: if you only get a few interactors for your assigned gene, try repeating your query
including those interactors along with your query gene/protein product, to identify
interactors of interactors. This might be necessary to do a meaningful GO enrichment
analysis.
GO Enrichment Analysis
A useful tool for performing GO enrichment analysis for lists of Arabidopsis genes or
proteins is AgriGO (Du et al., 2010, Nucleic Acids Research; doi:
http://dx.doi.org/10.1093/nar/gkq310) as per Class 11 at
http://systemsbiology.cau.edu.cn/agriGOv2/classification_analysis.php?category=Plant&&family=Brassicaceae.
Expression Analysis
Bio-Analytic Resource eFP Browser: http://bar.utoronto.ca/efp/. In which Data Source is
expression the strongest or most leptokurtic in a particular sample, cell type, or tissue?
Coexpression tools
Limit your output to the top 50 coexpressed genes (or fewer). It might be more useful to
do a condition-dependent analysis, limiting the samples that you are using to specific
aspects of biology, e.g. abiotic stress if your gene of interest seems to be induced in
response to abiotic stress.
Expression Angler:
http://bar.utoronto.ca/ntools/cgi-bin/ntools_expression_angler.cgi
ATTEDII, Arabidopsis thaliana coexpression database: http://atted.jp
Cis-element identification / mapping
Athena: http://bioinformatics1.smb.wsu.edu/cgi-bin/Athena/cgi/home.pl
Cistome: http://bar.utoronto.ca/cistome_legacy/cgi-bin/BAR_Cistome.cgi
Integrative online prediction programs
GeneMANIA: http://genemania.org/
AraNet: http://www.inetbio.org/aranet/
* All of these tools have papers associated with them. Cite them if you use them!