Development and evaluation of a core genome MLST schema for Haemophilus influenzae

Thumbnail Image
Iskander, Mariam
Journal Title
Journal ISSN
Volume Title
Haemophilus influenzae is a human pathogen that can cause disease in young children and the elderly. While there are several typing methods used for H. influenzae, serotyping and multi locus sequence typing (MLST) are the two most commonly used methods. The antigenic properties of the polysaccharide capsule surrounding some H. influenzae are used to classify the encapsulated strains into six serotypes (a-f), whereas non-encapsulated strains are considered non-typeable (NTHi). Historically, H. influenzae serotype b (Hib) has been the leading cause of morbidity and mortality worldwide. The introduction of a Hib conjugate vaccine in the 1990s drastically reduced the incidence of Hib disease. In the following years however, serotype f (Hif) has emerged as the most dominant serotype in the general population while serotype a (Hia) has emerged in the indigenous populations of North America. Since the Hib vaccine does not protect against non-Hib strains, the rising rates of disease warrants investigation into the development of vaccines for other H. influenzae serotypes. Developing an effective vaccine for serotypeable strains requires an understanding of its population structure; however, the population structure of H. influenzae is currently unclear. Although the 7-gene MLST is commonly used in laboratories worldwide, advances in genome sequencing can be used to provide a vastly more detailed understanding of the population structure of H. influenzae. This study investigates the utility of a core genome MLST scheme (cgMLST) as a potential extended MLST scheme for H. influenzae typing. A total of 314 genomes were used to design a cgMLST schema. Minimum spanning trees were generated based on the 7-gene MLST, the ribosomal protein MLST (rMLST) and cgMLST schemas, and all three schemas were evaluated for concordance using Simpson’s index of diversity, the adjusted Rand coefficient and the adjusted Wallace coefficient. A single nucleotide variant (SNV) analysis was performed, and a SNV-based phylogeny was used to compare the concordance of all three methods. The cgMLST schema contained a total of 980 loci, and partitioned the H. influenzae genomes into 204 partitions. The cgMLST schema was shown to have higher discriminatory power compared to the 7-gene MLST and rMLST schemas. Additionally, the cgMLST was found to have the highest level of concordance to the SNV-based phylogeny. The results of this study indicate possible capsular switching or loss among H. influenzae. Overall, the cgMLST schema provides higher discriminatory power over the classical 7-gene MLST and the rMLST schemas. A 7-gene MLST schema is considered the gold standard in H. influenzae typing, however, with the lowering cost of sequencing, whole genome sequencing-based typing methods should be used. The cgMLST has strong potential to replace the 7-gene MLST scheme as a typing method for H. influenzae.
wgMLST, cgMLST, Haemophilus, influenzae, MLST