Tandem repeats (TRs) are associated with human disease, play a role in evolution and are important in regulatory processes. Despite their importance, locating and characterizing these patterns within anonymous DNA sequences remains a challenge. In part, the difficulty is due to imperfect conservation of patterns and complex pattern structures. We study recognition algorithms for two complex pattern structures: variable length tandem repeats (VLTRs) and multi-period tandem repeats (MPTRs).Results
We extend previous algorithmic research to a class of regular tandem repeats (RegTRs). We formally define RegTRs, as well as, two important subclasses: VLTRs and MPTRs. We present algorithms for identification of TRs in these classes. Furthermore, our algorithms identify degenerate VLTRs and MPTRs: repeats containing substitutions, insertions and deletions. To illustrate our work, we present results of our analysis for two difficult regions in cattle and human which reflect practical occurrences of these subclasses in GenBank sequence data.
In addition, we show the applicability of our algorithmic techniques for identifying Alu sequences, gene clusters and other distant regions of similarity. We illustrate this with an example from yeast chromosome I.