References of "Couto, Francisco M"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailAccurate filtering of privacy-sensitive information in raw genomic data
Decouchant, Jérémie UL; Fernandes, Maria UL; Volp, Marcus UL et al

in Journal of Biomedical Informatics (2018)

Sequencing thousands of human genomes has enabled breakthroughs in many areas, among them precision medicine, the study of rare diseases, and forensics. However, mass collection of such sensitive data ... [more ▼]

Sequencing thousands of human genomes has enabled breakthroughs in many areas, among them precision medicine, the study of rare diseases, and forensics. However, mass collection of such sensitive data entails enormous risks if not protected to the highest standards. In this article, we follow the position and argue that post-alignment privacy is not enough and that data should be automatically protected as early as possible in the genomics workflow, ideally immediately after the data is produced. We show that a previous approach for filtering short reads cannot extend to long reads and present a novel filtering approach that classifies raw genomic data (i.e., whose location and content is not yet determined) into privacy-sensitive (i.e., more affected by a successful privacy attack) and non-privacy-sensitive information. Such a classification allows the fine-grained and automated adjustment of protective measures to mitigate the possible consequences of exposure, in particular when relying on public clouds. We present the first filter that can be indistinctly applied to reads of any length, i.e., making it usable with any recent or future sequencing technologies. The filter is accurate, in the sense that it detects all known sensitive nucleotides except those located in highly variable regions (less than 10 nucleotides remain undetected per genome instead of 100,000 in previous works). It has far less false positives than previously known methods (10% instead of 60%) and can detect sensitive nucleotides despite sequencing errors (86% detected instead of 56% with 2% of mutations). Finally, practical experiments demonstrate high performance, both in terms of throughput and memory consumption. [less ▲]

Detailed reference viewed: 244 (49 UL)
Full Text
Peer Reviewed
See detailCloud-Assisted Read Alignment and Privacy
Fernandes, Maria UL; Decouchant, Jérémie UL; Couto, Francisco M. et al

in 11th International Conference on Practical Applications of Computational Biology & Bioinformatics 2017 (2017)

Thanks to the rapid advances in sequencing technologies, genomic data is now being produced at an unprecedented rate. To adapt to this growth, several algorithms and paradigm shifts have been proposed to ... [more ▼]

Thanks to the rapid advances in sequencing technologies, genomic data is now being produced at an unprecedented rate. To adapt to this growth, several algorithms and paradigm shifts have been proposed to increase the throughput of the classical DNA workflow, e.g. by relying on the cloud to perform CPU intensive operations. However, the scientific community raised an alarm due to the possible privacy-related attacks that can be executed on genomic data. In this paper we review the state of the art in cloud-based alignment algorithms that have been developed for performance. We then present several privacy-preserving mechanisms that have been, or could be, used to align reads at an incremental performance cost. We finally argue for the use of risk analysis throughout the DNA workflow, to strike a balance between performance and protection of data. [less ▲]

Detailed reference viewed: 181 (39 UL)
Full Text
Peer Reviewed
See detailHow can photo sharing inspire sharing genomes?
Cogo, Vinicius Vielmo; Bessani, Alysson; Couto, Francisco M. et al

in 11th International Conference on Practical Applications of Computational Biology & Bioinformatics 2017 (2017)

People usually are aware of the privacy risks of publish-ing photos online, but these risks are less evident when sharing humangenomes. Modern photos and sequenced genomes are both digital rep ... [more ▼]

People usually are aware of the privacy risks of publish-ing photos online, but these risks are less evident when sharing humangenomes. Modern photos and sequenced genomes are both digital rep-resentations of real lives. They contain private information that maycompromise people’s privacy, and still, their highest value is most oftimes achieved only when sharing them with others. In this work, wepresent an analogy between the privacy aspects of sharing photos andsharing genomes, which clarifies the privacy risks in the latter to thegeneral public. Additionally, we illustrate an alternative informed modelto share genomic data according to the privacy-sensitivity level of eachportion. This article is a call to arms for a collaborative work between ge-neticists and security experts to build more effective methods to system-atically protect privacy, whilst promoting the accessibility and sharingof genomes [less ▲]

Detailed reference viewed: 148 (39 UL)
Full Text
Peer Reviewed
See detailA High-Throughput Method to Detect Privacy-Sensitive Human Genomic Data
Cogo, Vinicius Vielmo; Bessani, Alysson; Couto, Francisco M. et al

in Proceedings of the 14th ACM Workshop on Privacy in the Electronic Society (2015)

Finding the balance between privacy protection and data sharing is one of the main challenges in managing human genomic data nowadays. Novel privacy-enhancing technologies are required to address the ... [more ▼]

Finding the balance between privacy protection and data sharing is one of the main challenges in managing human genomic data nowadays. Novel privacy-enhancing technologies are required to address the known disclosure threats to personal sensitive genomic data without precluding data sharing. In this paper, we propose a method that systematically detects privacy-sensitive DNA segments coming directly from an input stream, using as reference a knowledge database of known privacy-sensitive nucleic and amino acid sequences. We show that adding our detection method to standard security techniques provides a robust, efficient privacy-preserving solution that neutralizes threats related to recently published attacks on genome privacy based on short tandem repeats, disease-related genes, and genomic variations. Current global knowledge on human genomes demonstrates the feasibility of our approach to obtain a comprehensive database immediately, which can also evolve automatically to address future attacks as new privacy-sensitive sequences are identified. Additionally, we validate that the detection method can be fitted inline with the NGS---Next Generation Sequencing---production cycle by using Bloom filters and scaling out to faster sequencing machines. [less ▲]

Detailed reference viewed: 180 (27 UL)