A privacy-preserving distributed filtering framework for NLP artifacts

Sadat, Md Nazmus; Aziz, Md Momin Al; Mohammed, Noman; Pakhomov, Serguei; Liu, Hongfang; Jiang, Xiaoqian

A privacy-preserving distributed filtering framework for NLP artifacts

dc.contributor.author	Sadat, Md Nazmus
dc.contributor.author	Aziz, Md Momin Al
dc.contributor.author	Mohammed, Noman
dc.contributor.author	Pakhomov, Serguei
dc.contributor.author	Liu, Hongfang
dc.contributor.author	Jiang, Xiaoqian
dc.date.accessioned	2019-10-01T06:06:37Z
dc.date.issued	2019-09-07
dc.date.updated	2019-10-01T06:06:37Z
dc.description.abstract	Abstract Background Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurately locate and scrub all sensitive elements from notes in an automatic manner. An alternative approach is to remove sentences that might contain sensitive terms related to personal information. Methods A previous study introduced a frequency-based filtering approach that removes sentences containing low frequency bigrams to improve the privacy protection without significantly decreasing the utility. Our work extends this method to consider clinical notes from distributed sources with security and privacy considerations. We developed a novel secure protocol based on private set intersection and secure thresholding to identify uncommon and low-frequency terms, which can be used to guide sentence filtering. Results As the computational cost of our proposed framework mostly depends on the cardinality of the intersection of the sets and the number of data owners, we evaluated the framework in terms of these two factors. Experimental results demonstrate that our proposed method is scalable in various experimental settings. In addition, we evaluated our framework in terms of data utility. This evaluation shows that the proposed method is able to retain enough information for data analysis. Conclusion This work demonstrates the feasibility of using homomorphic encryption to develop a secure and efficient multi-party protocol.
dc.identifier.citation	BMC Medical Informatics and Decision Making. 2019 Sep 07;19(1):183
dc.identifier.uri	https://doi.org/10.1186/s12911-019-0867-z
dc.identifier.uri	http://hdl.handle.net/1993/34313
dc.language.rfc3066	en
dc.rights	open access	en_US
dc.rights.holder	The Author(s).
dc.title	A privacy-preserving distributed filtering framework for NLP artifacts
dc.type	Journal Article
local.author.affiliation	Faculty of Science	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 12911_2019_Article_867.pdf
Size:: 858.94 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.24 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

University of Manitoba Scholarship
Faculty of Science Scholarly Works