Parallel and private generalized suffix tree construction and query on genomic data

dc.contributor.authorAl Aziz, Md M.
dc.contributor.authorThulasiraman, Parimala
dc.contributor.authorMohammed, Noman
dc.date.accessioned2022-07-01T03:36:29Z
dc.date.issued2022-06-17
dc.date.updated2022-07-01T03:36:29Z
dc.description.abstractAbstract Background Several technological advancements and digitization of healthcare data have provided the scientific community with a large quantity of genomic data. Such datasets facilitated a deeper understanding of several diseases and our health in general. Strikingly, these genome datasets require a large storage volume and present technical challenges in retrieving meaningful information. Furthermore, the privacy aspects of genomic data limit access and often hinder timely scientific discovery. Methods In this paper, we utilize the Generalized Suffix Tree (GST); their construction and applications have been fairly studied in related areas. The main contribution of this article is the proposal of a privacy-preserving string query execution framework using GSTs and an additional tree-based hashing mechanism. Initially, we start by introducing an efficient GST construction in parallel that is scalable for a large genomic dataset. The secure indexing scheme allows the genomic data in a GST to be outsourced to an untrusted cloud server under encryption. Additionally, the proposed methods can perform several string search operations (i.e., exact, set-maximal matches) securely and efficiently using the outlined framework. Results The experimental results on different datasets and parameters in a real cloud environment exhibit the scalability of these methods as they also outperform the state-of-the-art method based on Burrows-Wheeler Transformation (BWT). The proposed method only takes around 36.7s to execute a set-maximal match whereas the BWT-based method takes around 160.85s, providing a 4× speedup.
dc.identifier.citationBMC Genomic Data. 2022 Jun 17;23(1):45
dc.identifier.urihttps://doi.org/10.1186/s12863-022-01053-x
dc.identifier.urihttp://hdl.handle.net/1993/36585
dc.language.rfc3066en
dc.rightsopen accessen_US
dc.rights.holderThe Author(s)
dc.titleParallel and private generalized suffix tree construction and query on genomic data
dc.typeJournal Article
local.author.affiliationFaculty of Scienceen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
12863_2022_Article_1053.pdf
Size:
1.62 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.24 KB
Format:
Item-specific license agreed to upon submission
Description: