Chilibot user manual
Introduction
Chilibot is a specialized search software for the PubMed literature database. It is designed for rapidly identifying relationships between genes, proteins, or any keywords that the user might be interested.
In contrast to the PubMed interface where results are organized based on articles, Chilibot directly presents the key information user is seeking, i.e. sentences containing both of the terms. These sentences are organized into different relationship types based on linguistic analysis of the text.
In addition, Chilibot is especially suited to batch process large number of terms (e.g. microarray results). The relationships are summarized into as a graph, with links to sentences describing the relationships, as well as the terms themselves.
Many advanced options are available, including color coding the terms, editing the synonyms (e.g. gene/protein names), and context restricted search. It also automatically suggests new hypotheses based on information in the literature. Therefore, we recommend using Chilibot whenever you are searching relationships from PubMed (i.e. putting two keywords in the PubMed search box). Some examples of "two term search" to rapidly retrieve information include:
BDNF & TRKB (what is the relationship between these two proteins?)
BDNF & apoptosis (What is the relationship between a protein and a keyword?)
BDNF & hippacampus (Is BDNF expressed in the hippocampus?)
BDNF & phosphorelation
BDNF & polymorphism
The following more advanced example queries the pairwise relationships among ten genes, and that for each of the gene with 3 keywords (note the special keyword SECONDLIST), with user supplied additional synonyms. This example uses the pairwise search function to do a two-list search. We have provided a simpler interface for two-list search for begining users.
BDNF
TRKB
TRKC
CHRNA7
PSD95
CREB
HPRT
ARC
NUR77
SECONDLIST
APOPTOSIS (programed cell death; PCD)
Hippocampus
STEM CELLS
Registration
We try to make it as easy as possible for users to use Chilibot. Therefore, registration is not required. Unregistered users have access to the same functionality as registered users. By registering, user provide us an identity so that we can keep their results separate from others; they do not need to type in their email address for queries with more than five terms; they have passwords to protect their results from other users. In addition, registered users can keep their results as long as they'd like to, while results for unregistered users are deleted after a month to conserve disk space.
User input
Chilibot accept two types of terms: a) gene / protein symbols. Due to the large number of synonyms for gene / proteins, the best option is the official symbols for each model organism. Other symbols works as well but may find less information. b) keywords. You can use any keywords but be aware that short acronyms may have many different meanings. To avoid ambiguity, you can provide the acronym as well as the full name (see user defined synonyms for details.)
The default node color is cyan. You can change that by adding a number to each term.
Example:
BDNF 0.4
TRKB 1.3
APOPTOSIS 1.0
Numbers > 1 correspond to varying shades of green, while numbers between 0 and 1 correspond to different shade of red. Only folds (i.e. positive numbers) are allowed. Negative numbers, usually log(FoldChange), can be converted into folds by taking its antilog.

You can provide synonyms of your own for Chilibot to use. All synonyms are combined using Boolean "OR" before querying PubMed. To supply synonyms, put them on in parenthesis after the term but in the same line. You can use ';' to separate different synonyms. For example:
APOPTOSIS (programed cell death; PCD)
Types of search
This is the most simple form of Chilibot search. It retrieves sentences describing the relationship between two terms.
One list pairwise search

Pairwise queries are exhaustive searches of all possible relationships between any two terms in a list.

Chilibot defaults to conduct pair-wise queries. Sometimes, users are interested in one-to-many type of relationships. For example, a user wants to find the relationship between one term (e.g. polymorphism) and a list of genes. The interaction between the genes are non-relevant. The two lists search is a variant of the pairwise search in that it accepts two lists. Terms in the first list will be pairwisely searched against each other. List one terms also is searched against each term in list 2. However, terms in list 2 are not searched against each other. This can be done using the two list search boxes. Alternatively, you can use the pairwise search box, just separate your two lists with the special keyword 'SECONDLIST' (example).
Chilibot defaults to search the entire PubMed for relationships. However, users may wish to restrict the search to a subset of PubMed to increase the relevance of the results.
For example, we want to know if the polymorphism of a list of gene is associated with a particular disease. We can then do a two list search (context = disease name; list 1= polymorphism; list2=list of genes). Of course, you can also include phosphorelation, glycosylation, etc in your list. (see examples here and here)
This can be achieved by provide a few keywords in the 'context keyword' box. Chilibot then ignores abstracts that do not contain these keywords. If multiple keywords are provided, please connect them with Boolean operators.
"Modulate" or "Modulation" are special keywords in Chilibot. When they are used, Chilibot searches for words such as 'inhibition', 'stimulation', 'increase', 'reduce', etc. that are indicative of regulatory relationships. This is very useful if you are interested in finding genes or proteins that have regulatory relationship with your keyword (or protein) but you don't know what they are. For example, you want to know "Which gene is involved in synaptic plasticity". You can restrict the context to "gene OR protein" and search for the relationship between "synaptic plasticity" and "modulation". This will then search for abstracts containing gene/protein, and find sentences that have 'synaptic plasticity' and keywords indicative of modulation (e.g. inhibition, stimulation, phosphorylation ...). You can then read the sentences and find the name of genes. And yes, we are working on a function that will give you the list automatically.
Working with the results
Basic navigation
The results are presented to the user as a summary graph, with square boxes representing the queried terms and lines between the boxes representing the relationships. There is an icon (circles or rhomboids) in the middle of each line, representing the different kind of relationships. Clicking on the boxes will open a portal site for the represented term. Clicking on the circles/rhomboids will bring up sentences describing the relationships.
Re-graphing
We providing several methods for the user to re-organize the graph, including selectively exclude some less significant relationship types. Users can also generate sub-network graphs focused on one selected node and all its connections. The procedure is to click on the node to bring up the portal page of the node (on the right side). Then, click on the 'Draw simple graph in radiant' combination will bring up the new graph on the left side of the screen. Other options available include more complicated sub-network graph and different graph layout algorithms.
The process of generating a hypothesis is based on the results that a particular node (A) has no direct relationship with node C. But A is directly connected to at least one other nodes (e.g. B), which directly connects to C. The system then hypothesize that A-C has a relationship. To generate hypotheses for any term, click on the node in any graph to bring up the portal page of the node. Then, click on the 'TERM might be related to ..' button. Example
The optimal use of synonyms is critical to the overall performance of Chilibot. Chilibot provides a list of synonyms that are collected from several genomic/proteomic databases. Nomenclatures are semi-manually curated before compiled into a large database but it may not fit with your needs. Thus, all synonyms are provided to the user before the queries are conducted and can be edited at that time. However, if you are searching for a large number of terms, it is not practical to manually verify all the synonyms. In that case, you can just hit the "next" button and let our algorithm do the job. After the query is done, you can review the texts (sentences, with the synonyms highlighted). If you find a particular synonym is causing multiple false relationships, you can click on the "Edit synonym / update session" button to edit the synonyms. The query will be updated after you hit the "next" button.
Names with less than three letters are particularly prone to errors. Thus, we flagged some of them with "!". These flagged names are excluded from the query by default. You can include them in the query by deleting the exclamation mark.
Deleting relationships
If for some reason the automatically identified relationship is not interesting to the user (either erroneous or accurate but not relevant), it can be deleted. Deleted relationships can not be restored individually. However, if the the user elect to use the 'edit synonym/update session' function, the deleted relationships may re-appear. We plan to make that an option instead of a default behavior in the next version.
Automating your searches
This new version of Chilibot is compatible with some limited scripting functionality. The following simple example searches the relationship between three terms.
#!/usr/local/bin/perl
use LWP::Simple qw(get);
# Provide your email address so that you receive a notification when a query is done (if more than 6 terms are queried).
my $email="me\@my.domain";
#my $sessionName="testing"; # session name is optional
my $terms="apoptosis\ncreb\nbdnf\n";
&searchChilibot ($email, $sessionName, $terms);
sub searchChilibot{
my $email=shift;
my $sessionName=shift;
my $terms=shift;
my $url="http://www.chilibot.net/cgi-bin/chilibot/chilibot.cgi?email=$email&IN=t&list=$terms&name=$sessionName";
print "Waiting for Chilibot response (may take a while) ..\n";
my $response=get ($url);
if ( $response=~m|Done!.+?<a href=(.+index\.html)|){
print "search is done: http://www.chilibot.net$1\n";
}
if ($response=~m|<div *class=\"warning\">(.*)</div>|){
print "error:$1\n";
}
}
|
Publication
Chen H and Sharp BM.Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics. 2004 Oct 8;5:147 [Full text]
Cited in:
Mudunuri U, Stephens R, Bruining D, Liu D, Lebeda FJ. botXminer: mining biomedical literature with a new web-based application. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W748-52.
Hunter L, Cohen KB. Biomedical Language Processing: What's Beyong PubMed?.
Mol. Cell. 2006 Mar 3;21(5):589-94.
Li H, Chen H, Bao L, Manly KF, Chesler EJ, Lu L, Wang J, Zhou M, Williams
RW, Cui Y. Integrative genetic analysis of transcription modules: towards
filling the gap between genetic loci and inherited traits.
Hum Mol Genet. 2006 Feb 1;15(3):481-92. PubMed
Douglas SM, Montelione GT, Gerstein M. PubNet: a flexible system for
visualizing literature derived networks. Genome Biol. 2005;6(9):R80.
Weeber, M., Kors, J.A., Mons, B. (2005) Online tools to support
literature-based discovery in the life sciences Briefings in Bioinformatics 6
(3), pp. 277-286
Skusa, A., Rüegg, A., Köhle(2005) Extraction of biological interaction
networks from scientific literature Briefings in Bioinformatics 6 (3), pp.
263-276
Scherf, M., Epple, A., Werner, T. (2005) The next generation of literature
analysis: Integration of genomic analysis into text mining Briefings in
Bioinformatics 6 (3), pp. 287-297
Buckingham SD. (2005) Data mining for protein-protein interactions in
invertebrate model organisms. Invert Neurosci. 5(3-4):183-7.
Martin Krallinger and Alfonso Valencia (2005). Text-mining and
information-retrieval services for molecular biology. Genome Biology, 6:224
Henk Harkema, Ian Roberts, Rob Gaizauskas, and Mark Hepple (2005) A web
service for biomedical term look-up. Comparative and Functional Genomics Volume
6, Issue 1-2 , Pages 86 - 93
Kerns RT, Ravindranathan A, Hassan S, Cage MP, York T, Sikela JM, Williams
RW, Miles MF (2005) Ethanol-responsive brain region expression networks:
implications for behavioral responses to acute ethanol in DBA/2J versus
C57BL/6J mice. J Neurosci. 25(9):2255-66.
Hoffmann R, Krallinger M, Andres E, Tamames J, Blaschke C, Valencia A.
(2005) Text mining for metabolic pathways, signaling cascades, and protein
networks. Sci STKE. (283):pe21.
Parslow, Websites of Note, Biochemistry and Molecular Biology Education.
2005; 33: 310-312.
Nikolsky Y, Nikolskaya T, Bugrim A. Biological networks and analysis of
experimental data in drug discovery. Drug Discov Today. 2005 May
1;10(9):653-62.
Jorg Hakenberg, Conrad Plake, Ulf Leser Harald Kirsch, and Dietrich
Rebholz-Schuhmann. LLL'05 Challenge: Genic Interaction Extraction -
Identification of Language Patterns Based on Alignment and Finite State
Automata. Proceedings of the 4 th Learning Language in Logic Workshop (LLL05),
Bonn, Germany, 2005.