Classifying SARS-CoV-2 and common respiratory viruses from genome assemblies

dc.contributor.authorRahman, Mohaimen
dc.contributor.examiningcommitteeAshraf, Ahmed (Electrical and Computer Engineering)en_US
dc.contributor.examiningcommitteeThulasiraman, Parimala (Computer Science)en_US
dc.contributor.supervisorFerens, Ken
dc.date.accessioned2022-12-22T23:22:11Z
dc.date.available2022-12-22T23:22:11Z
dc.date.copyright2022-12-22
dc.date.issued2022-12-13
dc.date.submitted2022-12-22T20:26:45Zen_US
dc.degree.disciplineElectrical and Computer Engineeringen_US
dc.degree.levelMaster of Science (M.Sc.)en_US
dc.description.abstractPolymerase chain reaction (PCR) testing has widespread use in the systematic identification of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) strains. However, another approach for identifying the SARS-CoV-2 virus is by the machine learning classification of genome sequences, which has shown promising results. While trained clinicians usually perform the classification of genome sequences, a machine learning classifier can be used to complement the process and provide a short list for further analysis. A machine learning approach can provide a unique fingerprint of base pairs and yield a quick classification. To this end, we investigated a k-mer approach in order to classify genome sequences of SARS-CoV-2 and common respiratory viruses, as well as a Human genome sequence. We aim to provide a simplified classification approach that balances validation time while limiting hyperparameter tuning. Our approach achieved F1 scores in excess of 0.99, and perfect scores between the common respiratory viruses. We demonstrated a simple 5-base sub-sequencing scheme which has the power to differentiate over 7.91 million sequences from almost 20 thousand genome assemblies.en_US
dc.description.noteFebruary 2023en_US
dc.identifier.urihttp://hdl.handle.net/1993/37034
dc.language.isoengen_US
dc.rightsopen accessen_US
dc.subjectvirus dna classificationen_US
dc.subjectclassification of sars-cov2 and other respiratory virusesen_US
dc.titleClassifying SARS-CoV-2 and common respiratory viruses from genome assembliesen_US
dc.typemaster thesisen_US
local.subject.manitobanoen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Rahman_Mohaimen.pdf
Size:
900.81 KB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.2 KB
Format:
Item-specific license agreed to upon submission
Description: