Overview

DILIMOT is a server for finding short, over-represented peptide patterns, or Linear motifs, in a set of proteins. The full description of how the method works can be found in Neduva et al, PLoS Biology, 3, e405, 2005 (PDF PubMed Full Text).

DILIMOT is not a server for studying very closely related proteins. That is, it is not for multiple sequence alignment, or for identifying long, sequence similar stretches common to a set of proteins (as are say BLAST, CLUSTAL or MUSCLE). It is, however, useful for sets of proteins that do, nevertheless, share some common feature, such as an interaction partner, or cellular localisation, that is not apparent from these other approaches. DILIMOT seeks those features that alignment programs are often not able to find, as they are too short to be found using standard alignment procedures. Our studies have suggested that the kinds of features DILIMOT uncovers are not normally homologous (as globular domains normally are), but likely to be the result of convergence owing to a common functional requirement.

To do this, the method first removes those parts of the sequence not likely to contain linear motifs (based on an analysis of known motifs), makes the sequences non-redundant in terms of sequence similarity, finds and scores over-represented motifs, and presents the results to the user.

What you need to provide as input

A. Sequences and formatting

The central idea behind the server is that you have a set of sequences sharing something in common, either a binding partner or some other general feature like cellular location, and you want to find out if a linear motif can explain it. Note that these motifs are usually too short to be found by more traditional sequence searching methods like Blast, or domain resources like SMART or Pfam.

The method requires that you provide at least three sequences, though the method is really only fully reliable with four or more. True motifs can be found with three sequences, but these are often insignificant, and are less often the best ranked motif. The sequences you provide should also ideally be non-homologous, or at least contain sequence dissimilar regions. The reason for this is that instances of these motifs are not normally homologous to each other. Instead they are thought to arise convergently. Presenting the method with homologous sequences also gives rise to many hundreds of motifs arising purely from homology, and these are more likely to do with a common overall structure than a short peptide stretch conferring a particular function.

Sequences should be in FASTA format. Multiple sequences in a file are separated by title lines begining with a ">" character, with the sequence in one letter amino acid codes following on as many lines as are necessary. For example:


>gi|51702266|sp|P62993|GRB2_HUMAN Growth factor receptor-bound protein 2 
MEAIAKYDFKATADDELSFKRGDILKVLNEECDQNWYKAELNGKDGFIPKNYIEMKPHPWFFGKIPRAKA
EEMLSKQRHDGAFLIRESESAPGDFSLSVKFGNDVQHFKVLRDGAGKYFLWVVKFNSLNELVDYHRSTSV
SRNQQIFLRDIEQVPQQPTYVQALFDFDPQEDGELGFRRGDFIHVMDNSDPNWWKGACHGQTGMFPRNYV
TPVNRNV

>gi|15718763|ref|NP_203524.1| c-K-ras2 protein isoform a [Homo sapiens]
MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQ
YMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIP
FIETSAKTRQRVEDAFYTLVREIRQYRLKKISKEEKTPGCVKIKKCIIM

We have also provided a few example sequence sets for people to try. These are mostly those that we uncovered successfully in our previous studies.

If you only have a single protein, and want to see if it binds to other proteins containing a new linear motif, then you will need first to get together a set of proteins that interact with your protein of interest. If you are curious about whether your protein contains a new linear motif, you need to find sets in which your protein belongs (that is other proteins sharing some common attribute with your protein). We are currently developing a server to help create these sets of proteins. In the mean time, you can look for interaction partners at one of several interaction databases/software tools, including:

You will essentially need to get together a set of sequences in FASTA format to feed to the server.

B. Parameters

1. Sequence filtering: removing regions unlikely to contain motifs

The four tick boxes allow you to select those parts of the sequence that can be removed. By default the program will remove domains as identified by SMART and Pfam and will leave only one copy of homologous regions found in the sequence. The users can optinally also add a filter to remove other globular regions predicted by GlobPlot. It is also possible to stop the filtering process or alter what things are filtered. We would, however, recommend caution when doing this as results can be either unpredictable, or motifs can be found that have no be aring on a common function, arising solely because long homologous stretches are found in the sequences.

2. Boosting confidence in motifs via evolutionary conservation

The 'Your species' selection allows DILIMOT to be more intelligent about incorporating evolutionary conservation into the scoring of putative motifs. Important instances of linear motifs are normally preserved across closely related species (e.g. Human to Mouse; D. melanogaster to D. pseudoobscrua; C.elegans to C. briggsae). The SCONS score is a weighted combination of P-values for all species consisdered. In our experience, true motifs become much more significant when conservation is considered.

3. Parameters for motif finding through TEIRESIAS

TEIRESIAS is an algorithm that finds motifs within a set of protein sequences. It allows a number of parameters to be specified, delimiting the nature of the motifs reported. These are:


We would recommend that you not tinker too much with the default values, unless you have a very good reason to do so.

Retrieving results

Results will be displayed in the main window when ready. You will also, if you gave an Email address, be sent a link to them (we recommend doing this).

Results table

If the procedure has found any motifs, then these will be displayed in a table, showing:

Clicking on each motif takes to another page showing information about where the motif is found in the sequences, the other features in the sequence (domains, etc.) and how each instance of the motif is conserved in other species.

You can also select different cutoffs (for some of the above numbers) for the motifs to be displayed. You need simply to choose the new selection criteria and click 'Apply new selection'.

Graphical overview of motifs in the table

Below the table listing the motifs there is a button that allows you to get a quick overview of the motifs present in your sequences. By default, this will just show the domain bubble-grams for all the sequences, with the location of the best five motifs shown. You can alter the selection by either changing the display parameters, or by selecting particular motifs using the tick boxes found to the left of each of them and clicking 'Apply new selection' (i.e. the display always shows the motifs in the table that is currently on screen)

Motif information providing in the output pages

Clicking on motifs in the table takes you to a new page showing more details of a particular motif. Each sequence containing the putative motif is shown at the top of the page, with the amino acid sequences on the left of the page. Here the location(s) of motif the motif instance(s) are underlined and in red. If orthologous sequences were found, a short alignment of the motif containing regions is shown. Orthologous sequences are marked with an asterix if the motif instance is conserved.

The right of the page shows the proteins as domain bubblegrams, with domains identified by SMART or Pfam, globular segments (purple bars), and the location of motif instance(s) as red line(s).

How does one know if a motif is real?

We provide some pointers as to how to find real motifs here.

Run times

Typical DILIMOT jobs of 5-30 sequences usually run in less than five minutes, with very simple jobs (e.g. 5 comparatively short sequences) running in a matter of seconds. Actual online performance depends on the server load, and on other jobs running within the Russell Group services. Users who haven't received anything within 24 hours should contact us (see below).

Citing DILIMOT

When referring to results from this server, please cite:

V. Neduva, R. Linding, I. Su-Angrand, A. Stark, F. de Massi, T.J. Gibson, J. Lewis, L. Serrano, R.B. Russell, Systematic discovery of peptides mediating protein interaction networks PLoS Biology, 3, e405 2005. PDF PubMed Full Text

Contact and questions

DILIMOT was developed by Victor Neduva and Rob Russell at EMBL, Heidelberg.
Please address questions, comments or bug reports toVictor Neduva