Spark-based data analytics of sequence motifs in large omics data

dc.contributor.authorSarumi, Oluwafemi
dc.contributor.authorLeung, Carson
dc.contributor.authorAdetunmbi, Adebayo
dc.date.accessioned2019-01-08T18:19:47Z
dc.date.available2019-01-08T18:19:47Z
dc.date.issued2018
dc.date.submitted2019-01-08T07:32:14Zen
dc.description.abstractData explosion in bioinformatics in recent years has led to new challenges for researchers to develop novel techniques to discover new knowledge from the avalanche of omics data (e.g., genomics, proteomics, transcriptomics). These data are embedded with a wealth of information including frequently repeated patterns (i.e., sequence motifs). In genomics, deoxyribonucleic acid (DNA) sequence motifs are short repeated contiguous frequent subsequences located in the prompter region. Due to the high volume and various degrees of veracity of these DNA datasets generated by the next-generation sequencing techniques, sequence motif mining from DNA sequences poised a major challenge in bioinformatics. In this article, we present a distributed sequential algorithm—which uses the MapReduce programming model on a cluster of homogeneous distributed-memory system running on an Apache Spark computing framework—for DNA sequence motif mining. Experimental results show the effectiveness of our algorithm in Spark-based data analytics of sequence motifs in large omics data.en_US
dc.description.sponsorshipNatural Sciences and Engineering Research Council of Canada (NSERC); Tertiary Education Trust fund (TETFund) of Nigeria; University of Manitobaen_US
dc.identifier.citationO.A. Sarumi, C.K. Leung, A.O. Adetunmbi. Spark-based data analytics of sequence motifs in large omics data. Procedia Computer Science, 126 (2018), pp. 596-605en_US
dc.identifier.doihttp://dx.doi.org/10.1016/j.procs.2018.07.294
dc.identifier.urihttp://hdl.handle.net/1993/33656
dc.language.isoengen_US
dc.publisherElsevieren_US
dc.rightsopen accessen_US
dc.subjectbioinformaticsen_US
dc.subjectSparken_US
dc.subjectMapReduceen_US
dc.subjectdeoxyribonucleic acid (DNA)en_US
dc.subjectgenomicsen_US
dc.subjectsequence motifsen_US
dc.titleSpark-based data analytics of sequence motifs in large omics dataen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sarumi_JProCS_126_2018.pdf
Size:
618.58 KB
Format:
Adobe Portable Document Format
Description:
O.A. Sarumi, C.K. Leung, A.O. Adetunmbi. Spark-based data analytics of sequence motifs in large omics data. Procedia Computer Science, 126 (2018), pp. 596-605. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.24 KB
Format:
Item-specific license agreed to upon submission
Description: