The preferred sequence format for MEME is Pearson/Fasta format. For example, Sequences start with a header line followed by sequence lines. A header line has the character ``>'' in position one, followed by an unique name without any spaces, followed by (optional) descriptive text. After the header line come the actual sequence lines. Spaces and blank lines are ignored. Sequences may be in capital or lowercase or both.

MEME uses the first word in the header line of each sequence, truncated to 24 characters if necessary, as the name of the sequence. This name must be unique. Sequences with duplicate names will be ignored. (The first word in the title line is everything following the ">" up to the first blank.)

Sequence weights may be specified in the dataset file by special header lines where the unique name is ``WEIGHTS'' (all caps) and the discriptive text is a list of sequence weights. Sequence weights are numbers in the range 0 < w <=1. All weights are assigned in order to the sequences in the file. If there are more sequences than weights, the remainder are given weight one. Weights must be greater than zero and less than or equal to one. Weights may be specified by more than one "WEIGHT" entry which may appear anywhere in the file, but you must not put weights on lines that don't start with ">WEIGHT". When weights are used, sequences will contribute to motifs in proportion to their weights. Here is an example for a file of three sequences where the first two sequences are very similar and it is desired to down-weight them:

The web version of MEME also accepts protein and DNA sequences in any of the following formats by converting them to Pearson/Fasta format. When using these formats, it is not possible to specify sequence weights.
MEME uses the ReadSeq program to read in sequences. ReadSeq is copyright 1990 by D. G. Gilbert, Biology Dept., Indiana University.