Join for Free!
122498 members

table of contents table of contents

In general, the construction of trees is based on sequence alignments. This …

Home » Biology Articles » Methods & Techniques » Comparing sequences without using alignments: application to HIV/SIV subtyping » Discussion

- Comparing sequences without using alignments: application to HIV/SIV subtyping

The current HIV sequence analyses usually involve obtaining a gap-stripped multiple alignment to construct an unbiased phylogenetic tree. The alignment usually is first generated by HMMER ([20,21] and/or other alignment softwares, followed by a manual editing [22]. After deleting the ambiguously aligned positions from the alignment, the final alignment leaves about half the sequence length [13]. This is a time-consuming procedure, and, most importantly, the alignment quite often underestimates the sequence variabilities, especially those embedded in the ambiguously aligned positions.

In this paper, we introduced a tree building method without a sequence alignment requirement. This N-local-decoding method calculates sequence dissimilarity matrices, based on re-writing, and re-classifies input sequences. Our HIV/SIV subtyping results showed that the classifications produced by this method agree very well with those obtained by a combination of standard methods. Thus comparing biological sequences without alignments appears to be an alternative to better explore sequence relationships.

However, there exist some discrepancies between our N-local-decoding-method-calculated trees and those obtained from standard methods. The differences may simply suggest that the N parameter used in N-local-decoding-method needs to be better defined, or they may be a consequence of the fact that we include ambiguous regions that are often ignored by traditional methods.

The N-local-decoding-method is particularly useful in the analysis of sequence variety and in tracking the sequence evolutionary events when a good sequence alignment is not possible. Our N-local-decoding-method is meaningful from the evolutionary point of view. Its success in sequence subtyping relies on capturing (im)perfect repeats or conserved regions in sequences (similarity blocks that are either closely or remotely related, and the latter one is often undetected by traditional methods due to removal of ambiguous alignment regions). The similarity blocks include internal repeats in one sequence or conserved regions among sequences, and these blocks are not necessarily to appear in the same order in the original sequences. In our method, the inversion could be detected by including the reverse complementary sequences in the sequence set.

This method is also practically applicable in terms of computing time and convenience to use. All the calculations in this paper have been done within a few seconds on a regular PC. The quality of the algorithm [8] is responsible for this speed. This algorithm has a complexity linear in the total length of the set.

The N value, the only parameter used in this method, could be set empirically according to the N values listed in the Results section.

Our method thus provides an alternative way of constructing sequence trees. It is helpful in tracking sequence information embedded in ambiguous alignment regions. In addition, the possibility of comparing sequences of varied lengths also suggests its direct use in detecting sequence recombinants. Finally, the similarity blocks found by this method could also be used as the anchor points for those similarity-block-based alignment programs to refine the quality of the alignment.

rating: 0.00 from 0 votes | updated on: 12 Aug 2009 | views: 9681 |

Rate article: