binny: an automated binning algorithm to recover high-quality genomes from complex metagenomic datasets

[en] The reconstruction of genomes is a critical step in genome-resolved metagenomics and for multi-omic data integration from microbial communities. Here, we present binny, a binning tool that produces high-quality metagenome-assembled genomes (MAG) from both contiguous and highly fragmented genomes. Based on established metrics, binny outperforms or is highly competitive with commonly used and state-of-the-art binning methods and finds unique genomes that could not be detected by other methods. binny uses k-mer-composition and coverage by metagenomic reads for iterative, nonlinear dimension reduction of genomic signatures as well as subsequent automated contig clustering with cluster assessment using lineage-specific marker gene sets. When compared with seven widely used binning algorithms, binny provides substantial amounts of uniquely identified MAGs and almost always recovers the most near-complete (⁠>95% pure, >90% complete) and high-quality (⁠>90% pure, >70% complete) genomes from simulated datasets from the Critical Assessment of Metagenome Interpretation initiative, as well as substantially more high-quality draft genomes, as defined by the Minimum Information about a Metagenome-Assembled Genome standard, from a real-world benchmark comprised of metagenomes from various environments than any other tested method.

Research center :

- Luxembourg Centre for Systems Biomedicine (LCSB): Bioinformatics Core (R. Schneider Group)
- Luxembourg Centre for Systems Biomedicine (LCSB): Eco-Systems Biology (Wilmes Group)

Disciplines :

Genetics & genetic processes
Physical, chemical, mathematical & earth Sciences: Multidisciplinary, general & others

Author, co-author :

HICKL, Oskar ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core

Queirós, Pedro

WILMES, Paul ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Systems Ecology

MAY, Patrick ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core

Heintz-Buschart, Anna

External co-authors :

yes

Language :

English

Title :

binny: an automated binning algorithm to recover high-quality genomes from complex metagenomic datasets

Publication date :

13 October 2022

Journal title :

Briefings in Bioinformatics

ISSN :

1467-5463

eISSN :

1477-4054

Publisher :

Oxford University Press, United Kingdom

Peer reviewed :

Peer Reviewed verified by ORBi

Focus Area :

Systems Biomedicine

Additional URL :

https://doi.org/10.1093/bib/bbac431

FnR Project :

FNR11823097 - Microbiomes In One Health, 2017 (01/09/2018-28/02/2025) - Paul Wilmes

Funders :

FNR - Fonds National de la Recherche

Available on ORBilu :

since 14 November 2022

Statistics

Number of views

265 (4 by Unilu)

Number of downloads

211 (1 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

WoS citations^™

Bibliography

Quince C, Walker AW, Simpson JT, et al. Shotgun metagenomsics, from sampling to analysis. Nat Biotechnol 2017;35(9):833–44.
New FN, Brito IL. What Is Metagenomics Teaching Us, and What Is Missed? Annu Rev Microbiol 2020;74:117–35.
Zaremba-Niedzwiedzka K, Caceres EF, Saw JH, et al. Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature 2017;541(7637):353–8.
Delmont TO, Quince C, Shaiber A, et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes. Nat Microbiol 2018;3(7):804–13.
Shen L, Liu Y, Allen MA, et al. Linking genomic and physiological characteristics of psychrophilic arthrobacter to metagenomic data to explain global environmental distribution. Microbiome 2021;9(1):136.
Almeida A, Nayfach S, Boland M, et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol 2021;39(1):105–14.
Nayfach S, Roux S, Seshadri R, et al. A genomic catalog of Earth’s microbiomes. Nat Biotechnol 2021;39(4):499–509.
Tett A, Huang KD, Asnicar F, et al. The Prevotella copri Complex Comprises Four Distinct Clades Underrepresented in Westernized Populations. Cell Host Microbe 2019;26(5):666–679.e7.
Karcher N, Nigro E, Punčochář M, et al. Genomic diversity and ecology of human-associated Akkermansia species in the gut microbiome revealed by extensive metagenomic assembly. Genome Biol 2021;22(1):209.
Heintz-Buschart A, May P, Laczny CC, et al. Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes. Nat Microbiol 2016;2:16180.
Herold M, Arbas SM, Narayanasamy S, et al. Integration of time-series meta-omics data reveals how microbial ecosystems respond to disturbance. Nat Commun 2020;11(1):5281.
Chen L-X, Anantharaman K, Alon Shaiber A, et al. Accurate and complete genomes from metagenomes. Genome Res 2020; 30(3):315–33.
Alneberg J, Bjarnason BS, de Bruijn I, et al. Binning metagenomic contigs by coverage and composition. Nat Methods 2014; 11(11):1144–6.
Yu-Wei W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 2016;32(4):605–7.
Kang DD, Li F, Kirton E, et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 2019;7:e7359.
Meziti A, Rodriguez-R LM, Hatt JK, et al. The Reliability of Metagenome-Assembled Genomes (MAGs) in Representing Natural Populations: Insights from Comparing MAGs against Isolate Genomes Derived from the Same Fecal Sample. Appl Environ Microbiol 2021;87(6):e02593–20.
Sczyrba A, Hofmann P, Belmann P, et al. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nat Methods 2017;14(11):1063–71.
Meyer F, Fritz A, Deng Z-L, et al. Critical assessment of metagenome interpretation: the second round of challenges. Nat Methods 2022;19(4):429–40.
Meyer F, Lesker T-R, Koslicki D, et al. Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit. Nat Protoc 2021;16(4):1785–801.
Na S-I, Kim YO, Yoon S-H, et al. UBCG: Up-to-date bacterial core gene set and pipeline for phylogenomic tree reconstruction. Journal of Microbiology (Seoul, Korea) 2018;56(4):280–5.
Brown CT, Hug LA, Thomas BC, et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 2015;523(7559):208–11.
Rinke C, Schwientek P, Sczyrba A, et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 2013;499(7459):431–7.
Parks DH, Imelfort M, Skennerton CT, et al. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 2015;25(7):1043–55.
Bowers RM, Nikos C, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 2017; 35(8):725–31.
Mitchell AL, Almeida A, Beracochea M, et al. (eds). MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res 2019;gkz1035.
Almeida A, Mitchell AL, Tarkowska A, et al. Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments. GigaScience 2018;7(5).
Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP-a f lexible pipeline for genome-resolved metagenomic data analysis. Microbiome 2018;6(1):158.
Sieber CMK, Probst AJ, Sharrar A, et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol 2018;3(7):836–43.
Yue Y, Huang H, Qi Z, et al. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC bioinformatics 2020;21(1):334.
Murat Eren A, Esen OC, Quince C, et al. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 2015; 3:e1319.
Broeksema B, Calusinska M, McGee F, et al. ICoVeR - an interactive visualization tool for verification and refinement of metagenomic bins. BMC bioinformatics 2017;18(1):233.
Bornemann TLV, Esser SP, Stach TL, et al. uBin-a manual refining tool for metagenomic bins designed for educational purposes. preprint. Genomics 2020.
Murat Eren A, Kief l E, Shaiber A, et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat Microbiol 2021; 6(1):3–6.
Laczny CC, Sternal T, Plugaru V, et al. VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome 2015;3(1):1.
Köster J, Rahmann S. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics (Oxford, England) 2018;34(20): 3600.
Nissen JN, Johansen J, Allesøe RL, et al. Improved metagenome binning and assembly using deep variational autoencoders. Nat Biotechnol 2021;39(5):555–60.
Pan S, Zhu C, Zhao X-M, et al. A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments. Nat Commun 2022;13(1): 2326.
Liu C-C, Dong S-S, Chen J-B, et al. Metadecoder: a novel method for clustering metagenomic contigs. Microbiome 2022;10(1):46.
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics (Oxford, England) 2014;30(14):2068–9.
Queirós P, Delogu F, Hickl O, et al. Mantis: flexible and consensus-driven genome annotation. GigaScience 2021;10(6): giab042.
Hagberg AA, Schult DA, Swart PJ. Exploring Network Structure, Dynamics, and Function using NetworkX. In: Varoquaux G, Vaught T, Millman J (eds). Proceedings of the 7th Python in Science Conference. Pasadena, CA USA, 2008, 11–5.
Bland C, Ramsey TL, Sabree F, et al. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC bioinformatics 2007;8: 209.
Poličar PG, Stražar M, Zupan B. openTSNE: a modular Python library for t-SNE dimensionality reduction and embeddingbioRxiv. 2019.
Kobak D, Berens P. The art of using t-SNE for single-cell transcriptomics. Nat Commun 2019;10(1):5416.
Linderman GC, Steinerberger S. Clustering with t-SNE, Provably. SIAM Journal on Mathematics of Data Science 2019;1(2):313–32.
Belkina AC, Ciccolella CO, Anno R, et al. Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nat Commun 2019;10(1):5415.
Aggarwal CC, Hinneburg A, Keim DA. On the Surprising Behavior of Distance Metrics in High Dimensional Space. In Goos G, Hartmanis J, van Leeuwen J, et al., editors, Database Theory-ICDT 2001, volume 1973, pages 420–34. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001. Series Title: Lecture Notes in Computer Science.
Campello RJGB, Moulavi D, Sander J. Density-Based Clustering Based on Hierarchical Density Estimates. In Hutchison D, Kanade T, Kittler J, et al., editors, Advances in Knowledge Discovery and Data Mining, volume 7819, pages 160–72. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013. Series Title: Lecture Notes in Computer Science.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics (Oxford, England) 2010;26(6):841–2.
Hyatt D, Chen G-L, Locascio PF, et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC bioinformatics 2010;11:119.
Mistry J, Chuguransky S, Williams L, et al. Pfam: The protein families database in 2021. Nucleic Acids Res 2021;49(D1): D412–9.
Li W, O’Neill KR, Haft DH, et al. RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucleic Acids Res 2021;49(D1):D1020–8.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEMarXiv:1303.3997 [q-bio], May 2013, arXiv: 1303.3997.
Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods 2015;12(1):59–60.
Meyer F, Hofmann P, Belmann P, et al. AMBER: Assessment of Metagenome BinnERs. GigaScience 2018;7(6).
Hunter JD. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 2007;9(3):90–5.
Waskom M. seaborn: statistical data visualization. Journal of Open Source Software 2021;6(60):3021.
Lex A, Gehlenborg N, Strobelt H, et al. UpSet: Visualization of Intersecting Sets. IEEE Trans Vis Comput Graph 2014;20(12): 1983–92.
Jain C, Rodriguez-R LM, Phillippy AM, et al. High throughput ani analysis of 90k prokaryotic genomes reveals clear species boundaries. Nat Commun 2018;9(1):5114.
Ceballos J, Ariza-Jiménez L, Pinel N. Standardized approaches for assessing metagenomic contig binning performance from barnes-hut t-stochastic neighbor embeddings. In: González Díaz CA, González CC, Leber EL et al. (eds). VIII Latin American Conference on Biomedical Engineering and XLII National Conference on Biomedical Engineering. Cham: Springer International Publishing, 2020, 761–8.
Kriegel H-P, Kröger P, Zimek A. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data 2009;3(1):1–58.
Lin H-H, Liao Y-C. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Sci Rep 2016;6:24175.
Chen L-X, Anantharaman K, Alon Shaiber A, et al. Accurate and complete genomes from metagenomes. Genome Res 2020; 30(3):315–33.