Faculty of Science Scholarly Works

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 5 of 209
  • Item
    Open Access
    Federated learning algorithms for generalized mixed-effects model (GLMM) on horizontally partitioned data from distributed sources
    (2022-10-16) Li, Wentao; Tong, Jiayi; Anjum, Md. M.; Mohammed, Noman; Chen, Yong; Jiang, Xiaoqian
    Abstract Objectives This paper developed federated solutions based on two approximation algorithms to achieve federated generalized linear mixed effect models (GLMM). The paper also proposed a solution for numerical errors and singularity issues. And showed the two proposed methods can perform well in revealing the significance of parameter in distributed datasets, comparing to a centralized GLMM algorithm from R package (‘lme4’) as the baseline model. Methods The log-likelihood function of GLMM is approximated by two numerical methods (Laplace approximation and Gaussian Hermite approximation, abbreviated as LA and GH), which supports federated decomposition of GLMM to bring computation to data. To solve the numerical errors and singularity issues, the loss-less estimation of log-sum-exponential trick and the adaptive regularization strategy was used to tackle the problems caused by federated settings. Results Our proposed method can handle GLMM to accommodate hierarchical data with multiple non-independent levels of observations in a federated setting. The experiment results demonstrate comparable (LA) and superior (GH) performances with simulated and real-world data. Conclusion We modified and compared federated GLMMs with different approximations, which can support researchers in analyzing versatile biomedical data to accommodate mixed effects and address non-independence due to hierarchical structures (i.e., institutes, region, country, etc.).
  • Item
    Open Access
    Central-place foraging poses variable constraints year-round in a neotropical migrant
    (2022-09-20) Lalla, Kristen M.; Fraser, Kevin C.; Frei, Barbara; Fischer, Jason D.; Siegrist, Joe; Ray, James D.; Cohn-Haft, Mario; Elliott, Kyle H.
    Abstract Background “Central-place foragers” are constrained in their habitat selection and foraging range by the frequency with which they need to return to a central place. For example, chick-rearing songbirds that must feed their offspring hourly might be expected to have smaller foraging ranges compared to non-breeding songbirds that return nightly to a roost. Methods We used GPS units to compare the foraging behaviour of an aerial insectivorous bird, the purple martin (Progne subis), during the breeding season in three regions across North America, as well as the non-breeding season in South America. Specifically, we tested foraging range size and habitat selection. Results Foraging range did not vary among regions during breeding (14.0 ± 39.2 km2) and was larger during the nonbreeding period (8840 ± 8150 km2). Purple martins strongly preferred aquatic habitats to other available habitats year-round and in the Amazon commuted from night roosts in low productivity sediment-poor water, where risk of predation was probably low, to daytime foraging sites in productive sediment-rich water sites. Conclusions We provide the first estimates for foraging range size in purple martins and demonstrate foraging preference for aquatic habitats throughout two stages of the annual cycle. Understanding foraging constraints and habitat of aerial insectivores may help plan conservation actions throughout their annual cycle. Future research should quantify foraging behaviour during the post-breeding period and during migration.
  • Item
    Open Access
    Assessing Movement Patterns using Bayesian State-Space Models on Lake Winnipeg Walleye
    (Canadian Journal of Fisheries and Aquatic Sciences, 2021-03-18) Munaweera, Inesh; Muthukumarana, Saman; Gillis, D. M; Watkinson, D. A.; Charles, C.; Enders, E.C.
    Acoustic telemetry systems technology is useful for studying fish movement patterns and habitat use. However, the data generated from omnidirectional acoustic receivers are prone to large observation errors since the tagged animal can be anywhere in the detection range of the receiver. In this study, we used the Bayesian state-space modeling (SSM) approach and different smoothing methods including kernel smoothing and cross-validated local polynomial regression to reconstruct fish movement paths of Walleye (Sander vitreus) using data obtained from a telemetry receiver grid in Lake Winnipeg. Using SSM approach, we obtained more realistic movement paths, compared to the smoothing methods. In addition, we highlighted the advantages of the SSM approach to estimate undetected movement paths, over simple smoothing techniques, by comparing ecological metrics such as path length and tortuosity between different reconstruction approaches. Reconstructed paths could be useful in making effective fishery management decision on Lake Winnipeg in the future by providing information on how Walleye move and distribute in Lake Winnipeg over space and time.
  • Item
    Open Access
    Parallel and private generalized suffix tree construction and query on genomic data
    (2022-06-17) Al Aziz, Md M.; Thulasiraman, Parimala; Mohammed, Noman
    Abstract Background Several technological advancements and digitization of healthcare data have provided the scientific community with a large quantity of genomic data. Such datasets facilitated a deeper understanding of several diseases and our health in general. Strikingly, these genome datasets require a large storage volume and present technical challenges in retrieving meaningful information. Furthermore, the privacy aspects of genomic data limit access and often hinder timely scientific discovery. Methods In this paper, we utilize the Generalized Suffix Tree (GST); their construction and applications have been fairly studied in related areas. The main contribution of this article is the proposal of a privacy-preserving string query execution framework using GSTs and an additional tree-based hashing mechanism. Initially, we start by introducing an efficient GST construction in parallel that is scalable for a large genomic dataset. The secure indexing scheme allows the genomic data in a GST to be outsourced to an untrusted cloud server under encryption. Additionally, the proposed methods can perform several string search operations (i.e., exact, set-maximal matches) securely and efficiently using the outlined framework. Results The experimental results on different datasets and parameters in a real cloud environment exhibit the scalability of these methods as they also outperform the state-of-the-art method based on Burrows-Wheeler Transformation (BWT). The proposed method only takes around 36.7s to execute a set-maximal match whereas the BWT-based method takes around 160.85s, providing a 4× speedup.
  • Item
    Open Access
    Adaptive multiple imputations of missing values using the class center
    (2022-04-28) Phiwhorm, Kritbodin; Saikaew, Charnnarong; Leung, Carson K.; Polpinit, Pattarawit; Saikaew, Kanda R.
    Abstract Big data has become a core technology to provide innovative solutions in many fields. However, the collected dataset for data analysis in various domains will contain missing values. Missing value imputation is the primary method for resolving problems involving incomplete datasets. Missing attribute values are replaced with values from a selected set of observed data using statistical or machine learning methods. Although machine learning techniques can generate reasonably accurate imputation results, they typically require longer imputation durations than statistical techniques. This study proposes the adaptive multiple imputations of missing values using the class center (AMICC) approach to produce effective imputation results efficiently. AMICC is based on the class center and defines a threshold from the weighted distances between the center and other observed data for the imputation step. Additionally, the distance can be an adaptive nearest neighborhood or the center to estimate the missing values. The experimental results are based on numerical, categorical, and mixed datasets from the University of California Irvine (UCI) Machine Learning Repository with introduced missing values rate from 10 to 50% in 27 datasets. The proposed AMICC approach outperforms the other missing value imputation methods with higher average accuracy at 81.48% which is higher than those of other methods about 9 – 14%. Furthermore, execution time is different from the Mean/Mode method, about seven seconds; moreover, it requires significantly less time for imputation than some machine learning approaches about 10 – 14 s.