Tandem Repeat Identification
UW-Madison UW-Madison Beyond Tandem Repeats: Complex Patterns and Regions of Similarity Beyond Tandem Repeats: Complex Patterns and Regions of Similarity Amy Hauth Deborah Joseph Copyrights Publications About The Software

This paper presented at the 10th International Conference on Intelligent Systems for Molecular Biology (ISMB 2002) held August 3-7, 2002 in Edmonton, Canada. Proceedings published as supplemental issue of the journal Bioinformatics .


Abstract

Motivation

Tandem repeats (TRs) are associated with human disease, play a role in evolution and are important in regulatory processes. Despite their importance, locating and characterizing these patterns within anonymous DNA sequences remains a challenge. In part, the difficulty is due to imperfect conservation of patterns and complex pattern structures. We study recognition algorithms for two complex pattern structures: variable length tandem repeats (VLTRs) and multi-period tandem repeats (MPTRs).

Results

We extend previous algorithmic research to a class of regular tandem repeats (RegTRs). We formally define RegTRs, as well as, two important subclasses: VLTRs and MPTRs. We present algorithms for identification of TRs in these classes. Furthermore, our algorithms identify degenerate VLTRs and MPTRs: repeats containing substitutions, insertions and deletions. To illustrate our work, we present results of our analysis for two difficult regions in cattle and human which reflect practical occurrences of these subclasses in GenBank sequence data.

In addition, we show the applicability of our algorithmic techniques for identifying Alu sequences, gene clusters and other distant regions of similarity. We illustrate this with an example from yeast chromosome I.


Additional Links

  • Full Text: PDF
  • Analysis: Sequences