References of "Fernandes, Maria 50009622"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailMaskAl: Privacy Preserving Masked Reads Alignment using Intel SGX
Lambert, Christoph UL; Fernandes, Maria UL; Decouchant, Jérémie UL et al

Scientific Conference (2018)

The recent introduction of new DNA sequencing techniques caused the amount of processed and stored biological data to skyrocket. In order to process these vast amounts of data, bio-centers have been ... [more ▼]

The recent introduction of new DNA sequencing techniques caused the amount of processed and stored biological data to skyrocket. In order to process these vast amounts of data, bio-centers have been tempted to use low-cost public clouds. However, genomes are privacy sensitive, since they store personal information about their donors, such as their identity, disease risks, heredity and ethnic origin. The first critical DNA processing step that can be executed in a cloud, i.e., read alignment, consists in finding the location of the DNA sequences produced by a sequencing machine in the human genome. While recent developments aim at increasing performance, only few approaches address the need for fast and privacy preserving read alignment methods. This paper introduces MaskAl, a novel approach for read alignment. MaskAl combines a fast preprocessing step on raw genomic data — filtering and masking — with established algorithms to align sanitized reads, from which sensitive parts have been masked out, and refines the alignment score using the masked out information with Intel’s software guard extensions (SGX). MaskAl is a highly competitive privacy-preserving read alignment software that can be massively parallelized with public clouds and emerging enclave clouds. Finally, MaskAl is nearly as accurate as plain-text approaches (more than 96% of aligned reads with MaskAl compared to 98% with BWA) and can process alignment workloads 87% faster than current privacy-preserving approaches while using less memory and network bandwidth. [less ▲]

Detailed reference viewed: 78 (10 UL)
Full Text
Peer Reviewed
See detailAccurate filtering of privacy-sensitive information in raw genomic data
Decouchant, Jérémie UL; Fernandes, Maria UL; Volp, Marcus UL et al

in Journal of Biomedical Informatics (2018)

Sequencing thousands of human genomes has enabled breakthroughs in many areas, among them precision medicine, the study of rare diseases, and forensics. However, mass collection of such sensitive data ... [more ▼]

Sequencing thousands of human genomes has enabled breakthroughs in many areas, among them precision medicine, the study of rare diseases, and forensics. However, mass collection of such sensitive data entails enormous risks if not protected to the highest standards. In this article, we follow the position and argue that post-alignment privacy is not enough and that data should be automatically protected as early as possible in the genomics workflow, ideally immediately after the data is produced. We show that a previous approach for filtering short reads cannot extend to long reads and present a novel filtering approach that classifies raw genomic data (i.e., whose location and content is not yet determined) into privacy-sensitive (i.e., more affected by a successful privacy attack) and non-privacy-sensitive information. Such a classification allows the fine-grained and automated adjustment of protective measures to mitigate the possible consequences of exposure, in particular when relying on public clouds. We present the first filter that can be indistinctly applied to reads of any length, i.e., making it usable with any recent or future sequencing technologies. The filter is accurate, in the sense that it detects all known sensitive nucleotides except those located in highly variable regions (less than 10 nucleotides remain undetected per genome instead of 100,000 in previous works). It has far less false positives than previously known methods (10% instead of 60%) and can detect sensitive nucleotides despite sequencing errors (86% detected instead of 56% with 2% of mutations). Finally, practical experiments demonstrate high performance, both in terms of throughput and memory consumption. [less ▲]

Detailed reference viewed: 65 (10 UL)
Full Text
Peer Reviewed
See detailEnclave-Based Privacy-Preserving Alignment of Raw Genomic Information
Volp, Marcus UL; Decouchant, Jérémie UL; Lambert, Christoph UL et al

Scientific Conference (2017, October)

Recent breakthroughs in genomic sequencing led to an enormous increase of DNA sampling rates, which in turn favored the use of clouds to e ciently process huge amounts of genomic data. However, while ... [more ▼]

Recent breakthroughs in genomic sequencing led to an enormous increase of DNA sampling rates, which in turn favored the use of clouds to e ciently process huge amounts of genomic data. However, while allowing possible achievements in personalized medicine and related areas, cloud-based processing of genomic information also entails signi cant privacy risks, asking for increased protection. In this paper, we focus on the rst, but also most data-intensive, processing step of the genomics information processing pipeline: the alignment of raw genomic data samples (called reads) to a synthetic human reference genome. Even though privacypreserving alignment solutions (e.g., based on homomorphic encryption) have been proposed, their slow performance encourages alternatives based on trusted execution environments, such as Intel SGX, to speed up secure alignment. Such alternatives have to deal with data structures whose size by far exceeds secure enclave memory, requiring the alignment code to reach out into untrusted memory. We highlight how sensitive genomic information can be leaked when those enclave-external alignment data structures are accessed, and suggest countermeasures to prevent privacy breaches. The overhead of these countermeasures indicate that the competitiveness of a privacy-preserving enclavebased alignment has yet to be precisely evaluated. [less ▲]

Detailed reference viewed: 107 (16 UL)
Full Text
Peer Reviewed
See detailHow can photo sharing inspire sharing genomes?
Cogo, Vinicius Vielmo; Bessani, Alysson; Couto, Francisco M. et al

in 11th International Conference on Practical Applications of Computational Biology & Bioinformatics 2017 (2017)

People usually are aware of the privacy risks of publish-ing photos online, but these risks are less evident when sharing humangenomes. Modern photos and sequenced genomes are both digital rep ... [more ▼]

People usually are aware of the privacy risks of publish-ing photos online, but these risks are less evident when sharing humangenomes. Modern photos and sequenced genomes are both digital rep-resentations of real lives. They contain private information that maycompromise people’s privacy, and still, their highest value is most oftimes achieved only when sharing them with others. In this work, wepresent an analogy between the privacy aspects of sharing photos andsharing genomes, which clarifies the privacy risks in the latter to thegeneral public. Additionally, we illustrate an alternative informed modelto share genomic data according to the privacy-sensitivity level of eachportion. This article is a call to arms for a collaborative work between ge-neticists and security experts to build more effective methods to system-atically protect privacy, whilst promoting the accessibility and sharingof genomes [less ▲]

Detailed reference viewed: 67 (28 UL)
Full Text
Peer Reviewed
See detailCloud-Assisted Read Alignment and Privacy
Fernandes, Maria UL; Decouchant, Jérémie UL; Couto, Francisco M. et al

in 11th International Conference on Practical Applications of Computational Biology & Bioinformatics 2017 (2017)

Thanks to the rapid advances in sequencing technologies, genomic data is now being produced at an unprecedented rate. To adapt to this growth, several algorithms and paradigm shifts have been proposed to ... [more ▼]

Thanks to the rapid advances in sequencing technologies, genomic data is now being produced at an unprecedented rate. To adapt to this growth, several algorithms and paradigm shifts have been proposed to increase the throughput of the classical DNA workflow, e.g. by relying on the cloud to perform CPU intensive operations. However, the scientific community raised an alarm due to the possible privacy-related attacks that can be executed on genomic data. In this paper we review the state of the art in cloud-based alignment algorithms that have been developed for performance. We then present several privacy-preserving mechanisms that have been, or could be, used to align reads at an incremental performance cost. We finally argue for the use of risk analysis throughout the DNA workflow, to strike a balance between performance and protection of data. [less ▲]

Detailed reference viewed: 82 (27 UL)