What you need to paste into the main window

Sequences and formatting

The central idea behind the server is that you have a set of sequences sharing something in common, either a binding partner or some other general feature like cellular location, and you want to find out if a linear motif can explain it. Note that these motifs are usually too short to be found by more traditional sequence searching methods like Blast, or domain resources like SMART or Pfam.

The method requires that you provide at least three sequences, though the method is really only fully reliable with four or more. True motifs can be found with three sequences, but these are often insignificant, and are less often the best ranked motif. The sequences you provide should also ideally be non-homologous, or at least contain sequence dissimilar regions. The reason for this is that instances of these motifs are not normally homologous to each other. Instead they are thought to arise convergently. Presenting the method with homologous sequences also gives rise to many hundreds of motifs arising purely from homology, and these are more likely to do with a common overall structure than a short peptide stretch conferring a particular function.

Sequences should be in FASTA format. Multiple sequences in a file are separated by title lines begining with a ">" character, with the sequence in one letter amino acid codes following on as many lines as are necessary. For example:

>gi|51702266|sp|P62993|GRB2_HUMAN Growth factor receptor-bound protein 2 
>gi|15718763|ref|NP_203524.1| c-K-ras2 protein isoform a [Homo sapiens]

We have also provided a few example sequence sets for people to try. These are mostly those that we uncovered successfully in our previous studies.