Peak detection and statistical analysis of karyotypic variation from flow cytometry data

Thumbnail Image
Henry, Margot J M
Journal Title
Journal ISSN
Volume Title
Karyotypic variation is observed in fungal microbial populations isolated from ecological, clinical, and industrial environments and is also hallmark of many types of cancer. In order to characterize and understand the dynamics of karyotype subpopulations, we require an unbiased computational method to identify different subpopulations and quantify the number of cells within them. Flow cytometry is the gold standard method to measure genome size from each cell populations of interest. Cells within a population are typically measured from all phases of the cell cycle (G0/G1 prior to DNA replication, S phase during replication, and G2/M when cells have doubled their DNA but haven’t yet divided). Mathematical models can be fit to the distribution of genome sizes to determine the base ploidy of the population. These algorithms only work for single ploidy populations. When there are multiple subpopulations of mixed ploidy, the researcher must manually divide the original population into subpopulations prior to analysis. This is subject to considerable bias and is not feasible when there are multiple subpopulations. We developed an unbiased method to quantify karyotypic variation in populations from flow cytometry data and will release an open-source Bioconductor package, ploidyPeaks. The existing flowCore Bioconductor package was used to load flow cytometry data from reference cell populations with known and variable karyotypes into the R programming language. We used reference populations with known ploidy to determine a threshold for single ploidy population and flag the rest as possible mixed ploidy populations. We implemented a peak detection algorithm to identify G0/G1 and G2/M populations, we identified karyotypic subpopulations for mixed populations, and applied a nonlinear least squares to test how well data from each population fit the Dean-Jett-fox cell cycle models to provide a confidence term. Our method improves on existing algorithms by providing a measure of model fit, and the ability to quantify populations that contain multiple karyotypic subpopulations in an unbiased manner.
Peak Detection, Flow Cytometry, Cell Cycle Models