Reference : Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed compu...
Scientific journals : Article
Life sciences : Biochemistry, biophysics & molecular biology
Engineering, computing & technology : Computer science
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.
Lewis, Steven [Institute for Systems Biology]
Csordas, Attila [EMBL European Bioinformatics Institute > PRIDE Group Proteomics Services Team]
Killcoyne, Sarah mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > >]
Hermjakob, Henning [EMBL European Bioinformatics Institute > PRIDE Group Proteomics Services Team]
Hoopmann, Michael R. [Institute for Systems Biology]
Moritz, Robert L. [Institute for Systems Biology]
Deutsch, Eric W. [Institute for Systems Biology]
Boyle, John [Institute for Systems Biology]
BMC Bioinformatics
Yes (verified by ORBilu)
[en] Proteomics ; High-performance computing
[en] BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. RESULTS: We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. CONCLUSION: The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
Luxembourg Centre for Systems Biomedicine (LCSB): Computational Biology (Del Sol Group)
NIGMS (USA) R01GM087221 ; NCI (USA) R01CA137442
FP7 ; 260558 - PROTEOMEXCHANGE - International Data Exchange and Data Representation Standards for Proteomics

File(s) associated to this reference

Fulltext file(s):

Open access
hydra-paper.pdfPublisher postprint727.15 kBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.