MAST - Output

MAST -- Motif Alignment and Search Tool

Motif search tool


MAST sends you two e-mail messages:
  • a confirmation message and
  • your search results.
  • The e-mail messages and how MAST computes match scores and their statistical significance (p-values) are explained in the following sections. A sample search results file is also provided.

    Return to MAST introduction.


  • Confirmation message

    The first e-mail message you receive should be a confirmation message to let you know that your search request has been received. You should receive an e-mail message that looks something like this:
    Subject: MAST confirmation: alcohol dehydrogenase motifs
     
    Your MAST search request 14019 is being processed:
    Motif file: adh
    Database to search: SwissProt
    
    If you fail to receive the confirmation message, check your e-mail address and try resubmitting your MAST request.

  • Search Results

    The second e-mail message you should receive contains the results of the MAST search. It contains:

    Each section of the results file contains an explanation of how to interpret them.

  • Match Scores

    The match score of a motif to a position in a sequence is the sum of the score from each row of the position-dependent scoring matrix corresponding to the letter at that position in the sequence. For example, if the sequence is
    TAATGTTGGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGC
       ========
    
    and the motif is represented by the position-dependent scoring matrix (where each row of the matrix corresponds to a position in the motif)
    =========|=================================
    POSITION |   A        C        G        T
    =========|=================================
      1	 | 1.447    0.188   -4.025   -4.095 
      2	 | 0.739    1.339   -3.945   -2.325 
      3	 | 1.764   -3.562   -4.197   -3.895 
      4	 | 1.574   -3.784   -1.594   -1.994 
      5	 | 1.602   -3.935   -4.054   -1.370 
      6	 | 0.797   -3.647   -0.814    0.215 
      7	 |-1.280    1.873   -0.607   -1.933 
      8	 |-3.076    1.035    1.414   -3.913 
    =========|=================================
    
    then the match score of the fourth position in the sequence (underlined) would be found by summing the score for T in position 1, G in position 2 and so on until G in position 8. So the match score would be
      score = -4.095 + -3.945 + -3.895 + -1.994
    	  + -4.054 + -0.814 + -1.933 + 1.414 
    	= -19.316
    
    The match scores for other positions in the sequence are calculated in the same way. Match scores are only calculated if the match completely fits within the sequence. Match scores are not calculated if the motif would overhang either end of the sequence.

  • P-values

    MAST reports all matches of a sequence to a motif or group of motifs in terms of the p-value of the match. MAST considers the p-values of four types of events: All p-values are based on a random sequence model that assumes each position in a random sequence is generated according to the average letter frequencies of all sequences in the the appropriate (peptide or nucleotide) non-redundant database (ftp://ncbi.nlm.nih.gov/blast/db/) on September 22, 1996.

  • Database and Motifs

    This section shows information on the database that was searched and the motifs in the search query. The database section gives the date the database was last updated as well as the number of sequences and total sequence characters in it. The motifs are listed by motif number. The width and subsequence which would be given the best possible score for each motif is shown. If there is more than one motif in the query, all pairwise correlations between the motifs are shown. The correlations can range from -1 to +1, with +1 meaning that the shorter motif is exactly identical to part or all of the longer motif. High correlations can cause some combined p-values and e-values to be inaccurate (too low). It may be advisable to remove enough motifs from the query to insure that no pairs of motifs have high correlations. Any high correlations are indicated along with the suggestion that one of the motifs be removed from the query.

  • High-scoring Sequences

    MAST lists the names and part of the descriptive text of all sequences whose e-value is less than E. Sequences shorter than one or more of the motifs are skipped. The sequences are sorted by increasing e-value. The value of E is set to 10 for the WEB server but is user-selectable in the down-loadable version of MAST.

    When nucleotide sequences are searched, the strand (+ or -) is indicated. When nucleotide sequences are searched with peptide motifs, the reading frame (a, b or c) of the best matches is is also indicated. Matches are not all required to be in the same reading frame but must all be on the same strand.

  • Motif Diagrams

    Motif diagrams show the order and spacing of non-overlapping matches to the motifs in each high-scoring sequence. Motif occurrences are determined based on the position p-value of matches to the motif. In the MOTIF DIAGRAMS section of the output, diagrams are shown like this:


    6
    4
    3
    5
    7

    In the ANNOTATED SEQUENCES section of the output, diagrams are shown like this:

    27-[3]-44-<4>-99-[1]-7
    
    In this notation, strong matches (p-value < M) are shown in square brackets (`[ ]'), weak matches (M < p-value < M × 10) are shown in angle brackets (`< >') and the length of non-motif sequence ("spacer") is shown between dashes (`-'). The example above shows an initial spacer of length 27, followed by a strong match to motif 3, a spacer of length 44, a weak match to motif 4, a spacer of length 99, a strong match to motif 1 and a final non-motif sequence of length 7. The value of M is 0.0001 for the WEB server but is user-selectable in the down-loadable version of MAST.

    When nucleotide databases are searched, all matches must be on the same strand and the strand (+ or -) is indicated in the output. When peptide motifs are used to search nucleotide sequences, the reading frame (a, b or c) of each match is indicated next to the motif numbers in the motif diagrams found in the ANNOTATED SEQUENCES section of the output. For example,

    97-[6b]-17-[4a]-36-[3a]-45-[5a]-96-[7a]-59
    
    shows that motif 6 matched in reading frame b while the other motif matches occurred in reading frame a.

  • Annotated Sequences

    MAST annotates each high-scoring sequence by printing the sequence along with the position and strength of all the non-overlapping motif occurrences. The four lines above each motif occurrence contain, respectively, The best possible match to a motif is the sequence of letters which would achieve the highest match score.

    When peptide motifs are used to search nucleotide sequences, the reading frame (a, b or c) of each match is indicated with the motif number and the peptide translation of the matching sequence is shown just above the motif occurrence.

  • Sample MAST Search Results

    Here is an actual MAST search results file of a search of a nucleotide database with peptide motifs. It has been edited slightly to reduce its size by removing most of the 832 sequences which matched the motifs.

    Search using MAST
    MAST introduction
    MEME SYSTEM introduction