Strain classification of genomic data using variation graphs and gene ranking
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Genomics is the study of an organism’s genetic information; of how organism traits and characteristics are developed and inherited. A common problem in genomic analysis is identifying new strains of pathogens and classifying them. In scenarios involving pathogenic bacteria, for example, this can help with outbreak analysis, prediction, and prevention. There are existing high resolution classifiers that perform at the species level, but for organisms with high rates of gene transfer, classification at the sub-species (strain) level can still prove challenging. In this work, we implement a bioinformatics pipeline that tests the use of several metrics (one novel) for identifying specific loci of the foodborne disease pathogen Campylobacter jejuni that may be associated with particular strains. The pipeline itself is highly adaptable to user-provided bacterial genome data, and shows how certain tools can be used with our metric approach to classify novel strains into cluster groups from the user-provided metadata.