Classifying SARS-CoV-2 and common respiratory viruses from genome assemblies
dc.contributor.author | Rahman, Mohaimen | |
dc.contributor.examiningcommittee | Ashraf, Ahmed (Electrical and Computer Engineering) | en_US |
dc.contributor.examiningcommittee | Thulasiraman, Parimala (Computer Science) | en_US |
dc.contributor.supervisor | Ferens, Ken | |
dc.date.accessioned | 2022-12-22T23:22:11Z | |
dc.date.available | 2022-12-22T23:22:11Z | |
dc.date.copyright | 2022-12-22 | |
dc.date.issued | 2022-12-13 | |
dc.date.submitted | 2022-12-22T20:26:45Z | en_US |
dc.degree.discipline | Electrical and Computer Engineering | en_US |
dc.degree.level | Master of Science (M.Sc.) | en_US |
dc.description.abstract | Polymerase chain reaction (PCR) testing has widespread use in the systematic identification of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) strains. However, another approach for identifying the SARS-CoV-2 virus is by the machine learning classification of genome sequences, which has shown promising results. While trained clinicians usually perform the classification of genome sequences, a machine learning classifier can be used to complement the process and provide a short list for further analysis. A machine learning approach can provide a unique fingerprint of base pairs and yield a quick classification. To this end, we investigated a k-mer approach in order to classify genome sequences of SARS-CoV-2 and common respiratory viruses, as well as a Human genome sequence. We aim to provide a simplified classification approach that balances validation time while limiting hyperparameter tuning. Our approach achieved F1 scores in excess of 0.99, and perfect scores between the common respiratory viruses. We demonstrated a simple 5-base sub-sequencing scheme which has the power to differentiate over 7.91 million sequences from almost 20 thousand genome assemblies. | en_US |
dc.description.note | February 2023 | en_US |
dc.identifier.uri | http://hdl.handle.net/1993/37034 | |
dc.language.iso | eng | en_US |
dc.rights | open access | en_US |
dc.subject | virus dna classification | en_US |
dc.subject | classification of sars-cov2 and other respiratory viruses | en_US |
dc.title | Classifying SARS-CoV-2 and common respiratory viruses from genome assemblies | en_US |
dc.type | master thesis | en_US |
local.subject.manitoba | no | en_US |