[en] Finding the balance between privacy protection and data sharing is one of the main challenges in managing human genomic data nowadays. Novel privacy-enhancing technologies are required to address the known disclosure threats to personal sensitive genomic data without precluding data sharing. In this paper, we propose a method that systematically detects privacy-sensitive DNA segments coming directly from an input stream, using as reference a knowledge database of known privacy-sensitive nucleic and amino acid sequences. We show that adding our detection method to standard security techniques provides a robust, efficient privacy-preserving solution that neutralizes threats related to recently published attacks on genome privacy based on short tandem repeats, disease-related genes, and genomic variations. Current global knowledge on human genomes demonstrates the feasibility of our approach to obtain a comprehensive database immediately, which can also evolve automatically to address future attacks as new privacy-sensitive sequences are identified. Additionally, we validate that the detection method can be fitted inline with the NGS---Next Generation Sequencing---production cycle by using Bloom filters and scaling out to faster sequencing machines.
Interdisciplinary Centre for Security, Reliability and Trust
European Commission - EC ; Fundação para a Ciência e a Tecnologia
FnR ; FNR8149128 > Paulo Esteves Verissimo > IISD > Strategic RTnD Program on Information Infrastructure Security and Dependability > 01/01/2015 > 31/12/2019 > 2014