The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off


In this paper, we present an algorithm to quickly identify conserved patterns from a set of aligned protein sequences. Using contribution statistics, the proposed method identifies a motif describing the given set of sequences, and it is flexible enough to identify variable-length wildcard regions and also identifying motif elements based on regions containing amino-acids having similar physiochemical properties. In this paper, we compare its performance against other well known motif-discovery algorithms, on three datasets: snake-toxins, insulin proteins, and methylated-DNA protein-cysteine methyltransferase active-site enzymes. When tested with 91 neurotoxin protein sequences from 45 species of Elapid snakes, the algorithm successfully generated a motif which had a 97% precision. The motif generated by our algorithm had a 92% precision on the Insulin family, and 96.5% on the MGMT family of proteins. Our algorithm is fast, efficient, outperforms on average the commonly used motif generation algorithms in terms of accuracy, and never fails to report any motifs unlike some other algorithms.

Keywords

motif generation; PROSITE; protein families; patterns; motifs; flexible wildcard regions; snake toxins; insulin
User