De Novo Sequence Assembly of Viral Quasispecies
The rapid replication and high mutation rates of viruses like HIV lead to the formation of a community of highly similar genomes, referred to as a viral quasispecies, in an infected individual. Next-generation sequencing technologies enable researchers to sequence a complete quasispecies community with reduced expense and effort compared to traditional sequencing methods. However, typical sequence assembly software is designed to reconstruct a single genome from sequencing reads rather than a community of highly similar genomes. We describe and implement a de novo assembly method for reconstructing variants from a quasispecies community using de Bruijn graphs and a novel, heuristic path-construction method designed to identify corresponding variations at long distances across the genome. We predict the relative abundance of reconstructed variants using an approach inspired from Markov chains.