Login

Join for Free!
112030 members
table of contents table of contents

In general, the construction of trees is based on sequence alignments. This …


Biology Articles » Methods & Techniques » Comparing sequences without using alignments: application to HIV/SIV subtyping » Background

Background
- Comparing sequences without using alignments: application to HIV/SIV subtyping

Almost all existing methods for comparing biological sequences are based on certain implicit (and necessary) assumptions about the kinds of changes sequences undergo during their putative evolution. Under these assumptions, some changes like permutations and inversions are often ignored (see however [1-3]) while other kinds of changes (insertions/deletions, for instance) are poorly evaluated because they do not follow regular evolutionary models. As a result, the regions in which these changes occur are often treated as ambiguous and are deleted from sequence alignments before further evolutionary analysis. In addition, as no single alignment method copes with tremendous variety of sequence data, expert editing (manual alignment) becomes often necessary. This is particularly important if the sequences have successive duplications/deletions [4,5], and permutations.

To solve these problems, there has been work on methods of comparing sequences without alignments. An intuitive way of comparing sequences without alignments is to compare their compositions, i.e. the frequency of nucleotides or amino acids appearing in the sequences, as two sequences with significantly different compositions cannot be closely related. However, it is also true that two unrelated sequences can have very similar compositions. So the next step is to consider all N-words in a set of sequences (an N-word consists of N consecutive letters in a sequence) to calculate sequence dissimilarities [6]. This can show some evolutionary relationships. The problem is that the only straightforward relationship between two words is that they are either identical or different. This can be compensated by considering «imperfectly matched words». The problem is then a combinatorial explosion: as we all know, there are too many ways of being imperfect.

To track the evolutionary information embedded in ambiguous regions of sequence alignments and to alleviate the burden of manual editing, one of us developed a method that allows comparison by considering «local decoding of order N» of sequences (also called «N-local decoding method»), and keeping information of each set of overlapping (step 1) N-words that cover each site in a sequence [7]. The method is recapitulated in a recent work describing a new algorithm that computes the «local decoding of order N» of a sequence (or a set of sequences) with a complexity linear in sequence length, both in time and memory space [8].

The method was found useful for assessing a manually constructed HIV (Human Immunodeficiency Virus) LTR (long terminal repeat) multiple alignment, which is homology blocks based [5]. «Related sites» were found to correspond to homology blocks between compared sequences, leading us to think that the comparison of the sequences «composition» in «locally decoded letters» could yield a new method for calculating dissimilarities between sequences without an alignment and consequently without taking into account the order of homology blocks in each sequence.

The new method has also been tested against 10 reference alignment-based dissimilarity matrices calculated from 10 multiple alignments of DNA or RNA sequences: it gives alignment-free dissimilarities in very good agreement with the reference alignment-based dissimilarity matrices (better than the correlation obtained when the alignment-free dissimilarity matrix was calculated from the comparison of sequence compositions in words of length N) [8].

The aims of this paper are: (A) to give a concise description of this new method (see the section Methods) (B) to apply it to a problem of biological interest, namely the classification of HIV/SIV, two of the most variable organisms known so far (see Results and Discussion).


rating: 0.00 from 0 votes | updated on: 12 Aug 2009 | views: 5920 |

Rate article:







excellent!bad…