[en] The human microbiome has become an area of intense research due to its potential impact on human health. However, the analysis and interpretation of this data have proven to be challenging due to its complexity and high dimensionality. Machine learning (ML) algorithms can process vast amounts of data to uncover informative patterns and relationships within the data, even with limited prior knowledge. Therefore, there has been a rapid growth in the development of software specifically designed for the analysis and interpretation of microbiome data using ML techniques. These software incorporate a wide range of ML algorithms for clustering, classification, regression, or feature selection, to identify microbial patterns and relationships within the data and generate predictive models. This rapid development with a constant need for new developments and integration of new features require efforts into compile, catalog and classify these tools to create infrastructures and services with easy, transparent, and trustable standards. Here we review the state-of-the-art for ML tools applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on ML based software and framework resources currently available for the analysis of microbiome data in humans. The aim is to support microbiologists and biomedical scientists to go deeper into specialized resources that integrate ML techniques and facilitate future benchmarking to create standards for the analysis of microbiome data. The software resources are organized based on the type of analysis they were developed for and the ML techniques they implement. A description of each software with examples of usage is provided including comments about pitfalls and lacks in the usage of software based on ML methods in relation to microbiome data that need to be considered by developers and users. This review represents an extensive compilation to date, offering valuable insights and guidance for researchers interested in leveraging ML approaches for microbiome analysis.
Research center :
Luxembourg Centre for Systems Biomedicine (LCSB): Bioinformatics Core (R. Schneider Group) Luxembourg Centre for Systems Biomedicine (LCSB): Eco-Systems Biology (Wilmes Group)
Disciplines :
Environmental sciences & ecology Microbiology
Author, co-author :
Marcos-Zambrano, Laura Judith
López-Molina, Víctor Manuel
Bakir-Gungor, Burcu
Frohme, Marcus
Karaduzovic-Hadziabdic, Kanita
Klammsteiner, Thomas
Ibrahimi, Eliana
Lahti, Leo
Loncar-Turukalo, Tatjana
Dhamo, Xhilda
Simeon, Andrea
Nechyporenko, Alina
Pio, Gianvito
Przymus, Piotr
Sampri, Alexia
Trajkovik, Vladimir
Lacruz-Pleguezuelos, Blanca
Aasmets, Oliver
Araujo, Ricardo
Anagnostopoulos, Ioannis
Aydemir, Önder
Berland, Magali
Calle, M. Luz
Ceci, Michelangelo
Duman, Hatice
Gündoğdu, Aycan
Havulinna, Aki S.
Kaka Bra, Kardokh Hama Najib
Kalluci, Eglantina
Karav, Sercan
Lode, Daniel
Lopes, Marta B.
MAY, Patrick ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
Adapsyn Bioscience (2022). Available at: https://adapsyn.com/.
Al-Ajlan A. El Allali A. (2019). CNN-MGP: convolutional neural networks for metagenomics gene prediction. Interdiscip. Sci. Comput. Life Sci. 11, 628–635. doi: 10.1007/s12539-018-0313-4, PMID: 30588558
Albanese D. Fontana P. de Filippo C. Cavalieri D. Donati C. (2015). MICCA: a complete and accurate software for taxonomic profiling of metagenomic data. Sci. Rep. 5:9743. doi: 10.1038/srep09743, PMID: 25988396
Alneberg J. Bjarnason B. S. de Bruijn I. Schirmer M. Quick J. Ijaz U. Z. et al. (2014). Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146. doi: 10.1038/nmeth.3103, PMID: 25218180
Arango-Argoty G. Garner E. Pruden A. Heath L. S. Vikesland P. Zhang L. (2018). DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome 6:23. doi: 10.1186/s40168-018-0401-z, PMID: 29391044
Armour C. R. Topçuoğlu B. D. Garretto A. Schloss P. D. (2022). A goldilocks principle for the gut microbiome: taxonomic resolution matters for microbiome-based classification of colorectal cancer. MBio 13, e03161–e03121. doi: 10.1128/mbio.03161-21
Arndt D. Xia J. Liu Y. Zhou Y. Guo A. C. Cruz J. A. et al. (2012). METAGENassist: a comprehensive web server for comparative metagenomics. Nucleic Acids Res. 40, W88–W95. doi: 10.1093/nar/gks497, PMID: 22645318
Atlas Biomed (2022). Available at: https://atlasbiomed.com/uk.
Bakir-Gungor B. Bulut O. Jabeer A. Nalbantoglu O. U. Yousef M. (2021). Discovering potential taxonomic biomarkers of Type 2 diabetes from human gut microbiota via different feature selection methods. Front. Microbiol. 12:628426. doi: 10.3389/fmicb.2021.628426, PMID: 34512559
Bakir-Gungor B. Hacılar H. Jabeer A. Nalbantoglu O. U. Aran O. Yousef M. (2022). Inflammatory bowel disease biomarkers of human gut microbiota selected via different feature selection methods. PeerJ 10:e13205. doi: 10.7717/peerj.13205, PMID: 35497193
Baldini F. Heinken A. Heirendt L. Magnusdottir S. Fleming R. M. T. Thiele I. (2019). The Microbiome Modeling Toolbox: from microbial interactions to personalized microbial communities. Bioinformatics 35, 2332–2334. doi: 10.1093/bioinformatics/bty941, PMID: 30462168
Balech B. Brennan L. Carrillo de Santa Pau E. Cavalieri D. Coort S. D’Elia D. et al. (2022). The future of food and nutrition in ELIXIR. F1000Res 11:978. doi: 10.12688/f1000research.51747.1
Bates S. Tibshirani R. (2019). Log-ratio lasso: Scalable, sparse estimation for log-ratio models. Biom. Bull. 75, 613–624. doi: 10.1111/biom.12995, PMID: 30387139
Belcour A. Frioux C. Aite M. Bretaudeau A. Hildebrand F. Siegel A. (2020). Metage2Metabo, microbiota-scale metabolic complementarity for the identification of key species. elife 9:e61968. doi: 10.7554/eLife.61968, PMID: 33372654
Bokulich N. A. Dillon M. R. Zhang Y. Rideout J. R. Bolyen E. Li H. et al. (2018b). q2-longitudinal: longitudinal and paired-sample analyses of microbiome data. mSystems 3, e00219–e00218. doi: 10.1128/mSystems.00219-18
Bokulich N. A. Kaehler B. D. Rideout J. R. Dillon M. Bolyen E. Knight R. et al. (2018a). Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 6:90. doi: 10.1186/s40168-018-0470-z, PMID: 29773078
Bolyen E. Rideout J. R. Dillon M. R. Bokulich N. A. Abnet C. C. al-Ghalith G. A. et al. (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857. doi: 10.1038/s41587-019-0209-9, PMID: 31341288
Borozan I. Watt S. Ferretti V. (2015). Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification. Bioinformatics 31, 1396–1404. doi: 10.1093/bioinformatics/btv006, PMID: 25573913
Boycott K. M. Hartley T. Biesecker L. G. Gibbs R. A. Innes A. M. Riess O. et al. (2019). A Diagnosis for All Rare Genetic Diseases: The Horizon and the Next Frontiers. Cells 177, 32–37. doi: 10.1016/j.cell.2019.02.040, PMID: 30901545
Brady A. Salzberg S. L. (2009). Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6, 673–676. doi: 10.1038/nmeth.1358, PMID: 19648916
Cabassi A. Kirk P. D. W. (2020). Multiple kernel learning for integrative consensus clustering of omic datasets. Bioinformatics 36, 4789–4796. doi: 10.1093/bioinformatics/btaa593, PMID: 32592464
Callahan B. J. McMurdie P. J. Holmes S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11, 2639–2643. doi: 10.1038/ismej.2017.119, PMID: 28731476
Calle M. L. Pujolassos M. Susin A. (2023). coda4microbiome: compositional data analysis for microbiome cross-sectional and longitudinal studies. BMC Bioinform. 24:82. doi: 10.1186/s12859-023-05205-3, PMID: 36879227
Carrieri A. P. Haiminen N. Maudsley-Barton S. Gardiner L. J. Murphy B. Mayes A. E. et al. (2021). Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences. Sci. Rep. 11:4565. doi: 10.1038/s41598-021-83922-6, PMID: 33633172
Ceci M. Pio G. Kuzmanovski V. Džeroski S. (2015). Semi-supervised multi-view learning for gene network reconstruction. PLoS One 10:e0144031. doi: 10.1371/journal.pone.0144031, PMID: 26641091
Chapelle O. Schölkopf B. Zien A.. (2010). Semi-Supervised Learning. 2nd Edn Cambridge, Massachusetts. London, England: The MIT Press.
Chen W. Zhang C. K. Cheng Y. Zhang S. Zhao H. (2013). A Comparison of methods for clustering 16S rRNA sequences into OTUs. PLoS One 8:e70837. doi: 10.1371/journal.pone.0070837, PMID: 23967117
Cheng L. Walker A. W. Corander J. (2012). Bayesian estimation of bacterial community composition from 454 sequencing data. Nucleic Acids Res. 40, 5240–5249. doi: 10.1093/nar/gks227, PMID: 22406836
Chiarello M. McCauley M. Villéger S. Jackson C. R. (2022). Ranking the biases: The choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold. PLoS One 17:e0264443. doi: 10.1371/journal.pone.0264443, PMID: 35202411
Chroneos Z. C. (2010). Metagenomics: Theory, methods, and applications. Hum. Genomics 4:282. doi: 10.1186/1479-7364-4-4-282
Coenders G. Greenacre M. (2022). Three approaches to supervised learning for compositional data with pairwise logratios. J. Appl. Stat., 1–22. doi: 10.1080/02664763.2022.2108007
Cole J. R. Wang Q. Cardenas E. Fish J. Chai B. Farris R. J. et al. (2009). The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 37, D141–D145. doi: 10.1093/nar/gkn879, PMID: 19004872
Cui H. Zhang X. (2013). Alignment-free supervised classification of metagenomes by recursive SVM. BMC Genomics 14:641. doi: 10.1186/1471-2164-14-641, PMID: 24053649
Curry K. D. Nute M. G. Treangen T. J. (2021). It takes guts to learn: machine learning techniques for disease detection from the gut microbiome. Emerg. Topics Life Sci. 5, 815–827. doi: 10.1042/ETLS20210213, PMID: 34779841
de Jesus V. C. Khan M. W. Mittermuller B. A. Duan K. Hu P. Schroth R. J. et al. (2021). Characterization of supragingival plaque and oral swab microbiomes in children with severe early childhood caries. Front. Microbiol. 12:683685. doi: 10.3389/fmicb.2021.683685, PMID: 34248903
de Nies L. Lopes S. Busi S. B. Galata V. Heintz-Buschart A. Laczny C. C. et al. (2021). PathoFact: a pipeline for the prediction of virulence factors and antimicrobial resistance genes in metagenomic data. Microbiome 9:49. doi: 10.1186/s40168-020-00993-9, PMID: 33597026
Diener C. Gibbons S. M. Resendis-Antonio O. (2020). MICOM: metagenome-scale modeling to infer metabolic interactions in the gut microbiota. mSystems 5, e00606–e00619. doi: 10.1128/mSystems.00606-19
Dietrich A. Matchado M. S. Zwiebel M. Ölke B. Lauber M. Lagkouvardos I. et al. (2022). Namco: a microbiome explorer. Microb. Genom. 8:mgen000852. doi: 10.1099/mgen.0.000852, PMID: 35917163
Ding X. Cheng F. Cao C. Sun X. (2015). DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection. BMC Bioinform. 16:323. doi: 10.1186/s12859-015-0753-3, PMID: 26446672
Duda R. O. Hart P. E. Stork D. G. (2001). Pattern classification. 2nd Edn Hoboken, New Jersey, U.S.: Wiley.
Ebrahim A. Lerman J. A. Palsson B. O. Hyduke D. R. (2013). COBRApy: COnstraints-based reconstruction and analysis for python. BMC Syst. Biol. 7:74. doi: 10.1186/1752-0509-7-74, PMID: 23927696
Edgar R. C. (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461. doi: 10.1093/bioinformatics/btq461, PMID: 20709691
Edgar R. C. (2013). UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nat. Methods 10, 996–998. doi: 10.1038/nmeth.2604, PMID: 23955772
Eren A. M. Maignien L. Sul W. J. Murphy L. G. Grim S. L. Morrison H. G. et al. (2013). Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data. Methods Ecol. Evol. 4, 1111–1119. doi: 10.1111/2041-210X.12114, PMID: 24358444
European Commission Directorate General for Research and Innovation. and EOSC Executive Board (2021). EOSC interoperability framework: report from the EOSC Executive Board Working Groups FAIR and Architecture. Publications Office.
Faust K. Bauchinger F. Laroche B. de Buyl S. Lahti L. Washburne A. D. et al. (2018). Signatures of ecological processes in microbial community time series. Microbiome 6:120. doi: 10.1186/s40168-018-0496-2, PMID: 29954432
Feng Q. Liang S. Jia H. Stadlmayr A. Tang L. Lan Z. et al. (2015). Gut microbiome development along the colorectal adenoma–carcinoma sequence. Nat. Commun. 6:6528. doi: 10.1038/ncomms7528, PMID: 25758642
Fernandes A. D. Macklaim J. M. Linn T. G. Reid G. Gloor G. B. (2013). ANOVA-Like differential expression (ALDEx) analysis for mixed population RNA-Seq. PLoS One 8:e67019. doi: 10.1371/journal.pone.0067019, PMID: 23843979
Fernandes A. D. Reid J. N. S. Macklaim J. M. McMurrough T. A. Edgell D. R. Gloor G. B. (2014). Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2:15. doi: 10.1186/2049-2618-2-15, PMID: 24910773
Fierer N. Lauber C. L. Zhou N. McDonald D. Costello E. K. Knight R. (2010). Forensic identification using skin bacterial communities. Proc. Natl. Acad. Sci. U. S. A. 107, 6477–6481. doi: 10.1073/pnas.1000162107, PMID: 20231444
FINRISK (2022). Heart failure and microbiome.
Gao X. Lin H. Dong Q. Rho M. Wang L. (2017). A dirichlet-multinomial bayes classifier for disease diagnosis with microbial compositions. mSphere 2, e00536–e00517. doi: 10.1128/mSphereDirect.00536-17
García-Jiménez B. Muñoz J. Cabello S. Medina J. Wilkinson M. D. (2021). Predicting microbiomes through a deep latent space. Bioinformatics 37, 1444–1451. doi: 10.1093/bioinformatics/btaa971, PMID: 33289510
Ghannam R. B. Techtmann S. M. (2021). Machine learning applications in microbial ecology, human microbiome studies, and environmental monitoring. Comput. Struct. Biotechnol. J. 19, 1092–1107. doi: 10.1016/j.csbj.2021.01.028, PMID: 33680353
Ghodsi M. Liu B. Pop M. (2011). DNACLUST: accurate and efficient clustering of phylogenetic marker genes. BMC Bioinform. 12:271. doi: 10.1186/1471-2105-12-271, PMID: 21718538
Gloor G. B. Macklaim J. M. Fernandes A. D. (2016). Displaying variation in large datasets: plotting a visual summary of effect sizes. J. Comput. Graph. Stat. 25, 971–979. doi: 10.1080/10618600.2015.1131161
Goodswen S. J. Barratt J. L. N. Kennedy P. J. Kaufer A. Calarco L. Ellis J. T. (2021). Machine learning and applications in microbiology. FEMS Microbiol. Rev. 45:fuab015. doi: 10.1093/femsre/fuab015, PMID: 33724378
Gordon-Rodriguez E. Quinn T. P. Cunningham J. P. (2021). Learning sparse log-ratios for high-throughput sequencing data. Bioinformatics 38, 157–163. doi: 10.1093/bioinformatics/btab645, PMID: 34498030
Hai Nguyen T. et al. (2019). “Disease Prediction Using Synthetic Image Representations of Metagenomic Data and Convolutional Neural Networks.” in 2019 IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF), Danang, Vietnam. pp. 1–6
Hao X. Jiang R. Chen T. (2011). Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics 27, 611–618. doi: 10.1093/bioinformatics/btq725, PMID: 21233169
Heinken A. Basile A. Thiele I. (2021). Advances in constraint-based modelling of microbial communities. Curr. Opin. Syst. Biol. 27:100346. doi: 10.1016/j.coisb.2021.05.007
Heinken A. Thiele I. (2022). Microbiome Modelling Toolbox 2.0: efficient, tractable modelling of microbiome communities. Bioinformatics 38, 2367–2368. doi: 10.1093/bioinformatics/btac082, PMID: 35157025
Heinken A. Acharya G. Ravcheev D. A. Hertel J. Nyga M. Okpala O. E. et al. (2020). AGORA2: Large scale reconstruction of the microbiome highlights wide-spread drug-metabolising capacities. Syst. Biol. doi: 10.1101/2020.11.09.375451
Heirendt L. Arreckx S. Pfau T. Mendoza S. N. Richelle A. Heinken A. et al. (2019). Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat. Protoc. 14, 639–702. doi: 10.1038/s41596-018-0098-2, PMID: 30787451
Henry C. S. DeJongh M. Best A. A. Frybarger P. M. Linsay B. Stevens R. L. (2010). High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat. Biotechnol. 28, 977–982. doi: 10.1038/nbt.1672, PMID: 20802497
Hickl O. Queirós P. Wilmes P. May P. Heintz-Buschart A. (2022). Binny: an automated binning algorithm to recover high-quality genomes from complex metagenomic datasets. Brief. Bioinform. 23:bbac431. doi: 10.1093/bib/bbac431, PMID: 36239393
Ho Tin Kam (1995). “Random decision forests.” in Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal, Que., Canada. 1, pp. 278–282
Hoarfrost A. Aptekmann A. Farfañuk G. Bromberg Y. (2022). Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter. Nat. Commun. 13:2606. doi: 10.1038/s41467-022-30070-8, PMID: 35545619
Hoff K. J. Lingner T. Meinicke P. Tech M. (2009). Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res. 37, W101–W105. doi: 10.1093/nar/gkp327, PMID: 19429689
Hoff K. J. Tech M. Lingner T. Daniel R. Morgenstern B. Meinicke P. (2008). Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC Bioinform. 9:217. doi: 10.1186/1471-2105-9-217, PMID: 18442389
Holmes I. Harris K. Quince C. (2012). Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS One 7:e30126. doi: 10.1371/journal.pone.0030126, PMID: 22319561
Huse S. M. Welch D. M. Morrison H. G. Sogin M. L. (2010). Ironing out the wrinkles in the rare biosphere through improved OTU clustering: Ironing out the wrinkles in the rare biosphere. Environ. Microbiol. 12, 1889–1898. doi: 10.1111/j.1462-2920.2010.02193.x, PMID: 20236171
Jääskinen V. Parkkinen V. Cheng L. Corander J. (2014). Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model. Stat. Appl. Genet. Mol. Biol. 13, 105–121. doi: 10.1515/sagmb-2013-0031, PMID: 24246289
Jin B. T. Xu F. Ng R. T. Hogg J. C. (2022). Mian: interactive web-based microbiome data table visualization and machine learning platform. Bioinformatics 38, 1176–1178. doi: 10.1093/bioinformatics/btab754, PMID: 34788784
Kaehler B. D. Bokulich N. A. McDonald D. Knight R. Caporaso J. G. Huttley G. A. (2019). Species abundance information improves sequence taxonomy classification accuracy. Nat. Commun. 10:4643. doi: 10.1038/s41467-019-12669-6, PMID: 31604942
Kariin S. Burge C. (1995). Dinucleotide relative abundance extremes: a genomic signature. Trends Genet. 11, 283–290. doi: 10.1016/S0168-9525(00)89076-9, PMID: 7482779
Karlsson F. H. Tremaroli V. Nookaew I. Bergström G. Behre C. J. Fagerberg B. et al. (2013). Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103. doi: 10.1038/nature12198, PMID: 23719380
Karp P. D. Latendresse M. Paley S. M. Krummenacker M. Ong Q. D. Billington R. et al. (2016). Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology. Brief. Bioinform. 17, 877–890. doi: 10.1093/bib/bbv079, PMID: 26454094
Kartal E. Schmidt T. S. B. Molina-Montes E. Rodríguez-Perales S. Wirbel J. Maistrenko O. M. et al. (2022). A faecal microbiota signature with high specificity for pancreatic cancer. Gut 71, 1359–1372. doi: 10.1136/gutjnl-2021-324755, PMID: 35260444
Keilwagen J. Hartung F. Grau J. (2019). “GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data” in Gene Prediction 1962. ed. Kollmar M. (New York: Springer), 161–177.
Kelley D. R. Liu B. Delcher A. L. Pop M. Salzberg S. L. (2012). Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 40:e9. doi: 10.1093/nar/gkr1067, PMID: 22102569
Lapp Z. Han J. H. Wiens J. Goldstein E. J. C. Lautenbach E. Snitkin E. S. (2021). Patient and microbial genomic factors associated with carbapenem-resistant Klebsiella pneumoniae extraintestinal colonization and infection. mSystems 6, e00177–e00121. doi: 10.1128/mSystems.00177-21
Larsen P. E. Collart F. R. Field D. Meyer F. Keegan K. P. Henry C. S. et al. (2011). Predicted Relative Metabolomic Turnover (PRMT): determining metabolic turnover from a coastal marine metagenomic dataset. Microb. Informat. Exp. 1:4. doi: 10.1186/2042-5783-1-4, PMID: 22587810
le Chatelier E. Nielsen T. Qin J. Prifti E. Hildebrand F. Falony G. et al. (2013). Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546. doi: 10.1038/nature12506
Lee K. A. Thomas A. M. Bolte L. A. Björk J. R. de Ruijter L. K. Armanini F. et al. (2022). Cross-cohort gut microbiome associations with immune checkpoint inhibitor response in advanced melanoma. Nat. Med. 28, 535–544. doi: 10.1038/s41591-022-01695-5, PMID: 35228751
Lesniak N. A. Schubert A. M. Flynn K. J. Leslie J. L. Sinani H. Bergin I. L. et al. (2022). The gut bacterial community potentiates clostridioides difficile infection severity. MBio 13, e01183–e01122. doi: 10.1128/mbio.01183-22
Lewin H. A. Robinson G. E. Kress W. J. Baker W. J. Coddington J. Crandall K. A. et al. (2018). Earth BioGenome Project: Sequencing life for the future of life. Proc. Natl. Acad. Sci. U. S. A. 115, 4325–4333. doi: 10.1073/pnas.1720115115, PMID: 29686065
Li W. Godzik A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659. doi: 10.1093/bioinformatics/btl158
Li W. Jaroszewski L. Godzik A. (2001). Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17, 282–283. doi: 10.1093/bioinformatics/17.3.282, PMID: 11294794
Liang Q. Bible P. W. Liu Y. Zou B. Wei L. (2020). DeepMicrobes: taxonomic classification for metagenomics with deep learning. NAR Genom. Bioinform. 2:lqaa009. doi: 10.1093/nargab/lqaa009, PMID: 33575556
Lin H. Eggesbø M. Peddada S. D. (2022). Linear and nonlinear correlation estimators unveil undescribed taxa interactions in microbiome data. Nat. Commun. 13:4946. doi: 10.1038/s41467-022-32243-x, PMID: 35999204
Lin H. Peddada S. D. (2020). Analysis of compositions of microbiomes with bias correction. Nat. Commun. 11:3514. doi: 10.1038/s41467-020-17041-7, PMID: 32665548
Lindahl B. D. Nilsson R. H. Tedersoo L. Abarenkov K. Carlsen T. Kjøller R. et al. (2013). Fungal community analysis by high-throughput sequencing of amplified markers – a user’s guide. New Phytol. 199, 288–299. doi: 10.1111/nph.12243, PMID: 23534863
Liu C.-C. Dong S. S. Chen J. B. Wang C. Ning P. Guo Y. et al. (2022). MetaDecoder: a novel method for clustering metagenomic contigs. Microbiome 10:46. doi: 10.1186/s40168-022-01237-8, PMID: 35272700
Liu Y. Guo J. Hu G. Zhu H. (2013). Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinform. 14:S12. doi: 10.1186/1471-2105-14-S5-S12, PMID: 23735199
Liu Z. Hsiao W. Cantarel B. L. Drábek E. F. Fraser-Liggett C. (2011). Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics 27, 3242–3249. doi: 10.1093/bioinformatics/btr547, PMID: 21984758
Liu B. Sträuber H. Saraiva J. Harms H. Silva S. G. Kasmanas J. C. et al. (2022). Machine learning-assisted identification of bioindicators predicts medium-chain carboxylate production performance of an anaerobic mixed culture. Microbiome 10:48. doi: 10.1186/s40168-021-01219-2, PMID: 35331330
Liu S. Zhao W. Liu X. Cheng L. (2020). Metagenomic analysis of the gut microbiome in atherosclerosis patients identify cross-cohort microbial signatures and potential therapeutic target. FASEB J. 34, 14166–14181. doi: 10.1096/fj.202000622R, PMID: 32939880
Lo C. Marculescu R. (2019). MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinform. 20:314. doi: 10.1186/s12859-019-2833-2, PMID: 31216991
Lüll K. Arffman R. K. Sola-Leyva A. Molina N. M. Aasmets O. Herzig K. H. et al. (2021). The gut microbiome in polycystic ovary syndrome and its association with metabolic traits. J. Clin. Endocrinol. Metab. 106, 858–871. doi: 10.1210/clinem/dgaa848, PMID: 33205157
Lundberg S. Lee S.-I. (2017), A Unified Approach to Interpreting Model Predictions.
Ma H. Tan T. W. Ban K. H. K. (2021). A multi-task CNN learning model for taxonomic assignment of human viruses. BMC Bioinform. 22:194. doi: 10.1186/s12859-021-04084-w, PMID: 34078269
Magnúsdóttir S. Heinken A. Kutt L. Ravcheev D. A. Bauer E. Noronha A. et al. (2017). Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota. Nat. Biotechnol. 35, 81–89. doi: 10.1038/nbt.3703, PMID: 27893703
Mahé F. Rognes T. Quince C. de Vargas C. Dunthorn M. (2014). Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2:e593. doi: 10.7717/peerj.593, PMID: 25276506
Mallick H. Franzosa E. A. Mclver L. J. Banerjee S. Sirota-Madi A. Kostic A. D. et al. (2019). Predictive metabolomic profiling of microbial communities using amplicon or metagenomic sequences. Nat. Commun. 10:3136. doi: 10.1038/s41467-019-10927-1, PMID: 31316056
Marcos-Zambrano L. J. Karaduzovic-Hadziabdic K. Loncar Turukalo T. Przymus P. Trajkovik V. Aasmets O. et al. (2021). Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification. Front. Virol. 12:634511. doi: 10.3389/fmicb.2021.634511
Mariette J. Villa-Vialaneix N. (2018). Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics 34, 1009–1015. doi: 10.1093/bioinformatics/btx682, PMID: 29077792
Matsen F. A. Kodner R. B. Armbrust E. V. (2010). pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform. 11:538. doi: 10.1186/1471-2105-11-538, PMID: 21034504
McDonald D. Hyde E. Debelius J. W. Morton J. T. Gonzalez A. Ackermann G. et al. (2018). American Gut: an Open Platform for Citizen Science Microbiome Research. mSystems 3, e00031–e00018. doi: 10.1128/mSystems.00031-18
McHardy A. C. Martín H. G. Tsirigos A. Hugenholtz P. Rigoutsos I. (2007). Accurate phylogenetic classification of variable-length DNA fragments. Nat. Methods 4, 63–72. doi: 10.1038/nmeth976, PMID: 17179938
Mendes-Soares H. Mundy M. Soares L. M. Chia N. (2016). MMinte: an application for predicting metabolic interactions among the microbial species in a community. BMC Bioinform. 17:343. doi: 10.1186/s12859-016-1230-3, PMID: 27590448
Microbiome Employers (2022). Digital World Biology.
Montassier E. al-Ghalith G. A. Ward T. Corvec S. Gastinne T. Potel G. et al. (2016). Pretreatment gut microbiome predicts chemotherapy-related bloodstream infection. Genome Med. 8:49. doi: 10.1186/s13073-016-0301-4, PMID: 27121964
Moreno-Indias I. Lahti L. Nedyalkova M. Elbere I. Roshchupkin G. Adilovic M. et al. (2021). Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions. Front. Microbiol. 12:635781. doi: 10.3389/fmicb.2021.635781, PMID: 33692771
Nagpal S. Singh R. Taneja B. Mande S. S. (2022). MarkerML – marker feature identification in metagenomic datasets using interpretable machine learning. J. Mol. Biol. 434:167589. doi: 10.1016/j.jmb.2022.167589, PMID: 35662460
Nearing J. T. Comeau A. M. Langille M. G. I. (2021). Identifying biases and their potential solutions in human microbiome studies. Microbiome 9:113. doi: 10.1186/s40168-021-01059-0, PMID: 34006335
Nguyen N.-P. Warnow T. Pop M. White B. (2016). A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity. NPJ Biofilms Microbiomes 2:16004. doi: 10.1038/npjbiofilms.2016.4, PMID: 28721243
Nissen J. N. Johansen J. Allesøe R. L. Sønderby C. K. Armenteros J. J. A. Grønbech C. H. et al. (2021). Improved metagenome binning and assembly using deep variational autoencoders. Nat. Biotechnol. 39, 555–560. doi: 10.1038/s41587-020-00777-4, PMID: 33398153
Noecker C. Eng A. Srinivasan S. Theriot C. M. Young V. B. Jansson J. K. et al. (2016). Metabolic model-based integration of microbiome taxonomic and metabolomic profiles elucidates mechanistic links between ecological and metabolic variation. mSystems 1, e00013–e00015. doi: 10.1128/mSystems.00013-15
Noguchi H. Park J. Takagi T. (2006). MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res. 34, 5623–5630. doi: 10.1093/nar/gkl723, PMID: 17028096
Noguchi H. Taniguchi T. Itoh T. (2008). MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res. 15, 387–396. doi: 10.1093/dnares/dsn027, PMID: 18940874
Oh M. Zhang L. (2020). DeepMicro: deep representation learning for disease prediction based on microbiome data. Sci. Rep. 10:6026. doi: 10.1038/s41598-020-63159-5, PMID: 32265477
Orellana S. C. (2013). Assessment of fungal diversity in the environment using metagenomics:a decade in review. Fungal Genom Biol 3, 1–13. doi: 10.4172/2165-8056.1000110
Orth J. D. Thiele I. Palsson B. Ø. (2010). What is flux balance analysis? Nat. Biotechnol. 28, 245–248. doi: 10.1038/nbt.1614, PMID: 20212490
Pan S. Zhu C. Zhao X. M. Coelho L. P. (2022). A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments. Nat. Commun. 13:2326. doi: 10.1038/s41467-022-29843-y, PMID: 35484115
Parks D. H. MacDonald N. J. Beiko R. G. (2011). Classifying short genomic fragments from novel lineages using composition and homology. BMC Bioinform. 12:328. doi: 10.1186/1471-2105-12-328, PMID: 21827705
Pasolli E. Truong D. T. Malik F. Waldron L. Segata N. (2016). Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12:e1004977. doi: 10.1371/journal.pcbi.1004977, PMID: 27400279
Patil K. R. Roune L. McHardy A. C. (2012). The PhyloPythiaS Web server for taxonomic assignment of metagenome sequences. PLoS One 7:e38581. doi: 10.1371/journal.pone.0038581, PMID: 22745671
Picard M. Scott-Boyer M. P. Bodein A. Périn O. Droit A. (2021). Integration strategies of multi-omics data for machine learning analysis. Comput. Struct. Biotechnol. J. 19, 3735–3746. doi: 10.1016/j.csbj.2021.06.030, PMID: 34285775
Pio G. Mignone P. Magazzù G. Zampieri G. Ceci M. Angione C. (2022). Integrating genome-scale metabolic modelling and transfer learning for human gene regulatory network reconstruction. Bioinformatics 38, 487–493. doi: 10.1093/bioinformatics/btab647, PMID: 34499112
Pragmabio (2022). Available at: http://www.pragmabio.com/.
Qin J. Li Y. Cai Z. Li S. Zhu J. Zhang F. et al. (2012). A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60. doi: 10.1038/nature11450, PMID: 23023125
Qin N. Yang F. Li A. Prifti E. Chen Y. Shao L. et al. (2014). Alterations of the human gut microbiome in liver cirrhosis. Nature 513, 59–64. doi: 10.1038/nature13568
Queirós P. Delogu F. Hickl O. May P. Wilmes P. (2021). Mantis: flexible and consensus-driven genome annotation. GigaScience 10:giab042. doi: 10.1093/gigascience/giab042, PMID: 34076241
Quinn T.P. (2021) Stool Studies Don’t Pass the Sniff Test: A Systematic Review of Human Gut Microbiome Research Suggests Widespread Misuse of Machine LearningarXiv.
Quinn T. P. Erb I. (2020). Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data. NAR Genom. Bioinform 2:lqaa076. doi: 10.1093/nargab/lqaa076, PMID: 33575624
Rahman M.A. Rangwala H. (2018). “RegMIL: Phenotype Classification from Metagenomic Data.” in Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Washington DC USA. pp. 145–154
Ramon E. Belanche-Muñoz L. Molist F. Quintanilla R. Perez-Enciso M. Ramayo-Caldas Y. (2021). kernInt: A Kernel Framework for Integrating Supervised and Unsupervised Analyses in Spatio-Temporal Metagenomic Datasets. Front. Microbiol. 12:609048. doi: 10.3389/fmicb.2021.609048, PMID: 33584612
Rasheed Z. Rangwala H. (2012). Metagenomic taxonomic classification using extreme learning machines. J. Bioinforma. Comput. Biol. 10:1250015. doi: 10.1142/S0219720012500151, PMID: 22849369
Reiman D. Layden B. T. Dai Y. (2021). MiMeNet: exploring microbiome-metabolome relationships using neural networks. PLoS Comput. Biol. 17:e1009021. doi: 10.1371/journal.pcbi.1009021, PMID: 33999922
Reiman D. Metwally A. A. Sun J. Dai Y. (2020). PopPhy-CNN: A phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data. IEEE J. Biomed. Health Inform. 24, 2993–3001. doi: 10.1109/JBHI.2020.2993761, PMID: 32396115
Ren J. Song K. Deng C. Ahlgren N. A. Fuhrman J. A. Li Y. et al. (2020). Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8, 64–77. doi: 10.1007/s40484-019-0187-4, PMID: 34084563
Rho M. Tang H. Ye Y. (2010). FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38:e191. doi: 10.1093/nar/gkq747, PMID: 20805240
Rivera-Pinto J. Egozcue J. J. Pawlowsky-Glahn V. Paredes R. Noguera-Julian M. Calle M. L. (2018). Balances: a New perspective for microbiome analysis. mSystems 3, e00053–e00018. doi: 10.1128/mSystems.00053-18
Rognes T. Flouri T. Nichols B. Quince C. Mahé F. (2016). VSEARCH: a versatile open source tool for metagenomics. PeerJ 4:e2584. doi: 10.7717/peerj.2584, PMID: 27781170
Rohart F. Eslami A. Matigian N. Bougeard S. Lê Cao K. A. (2017b). MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. BMC Bioinform. 18:128. doi: 10.1186/s12859-017-1553-8, PMID: 28241739
Rohart F. Gautier B. Singh A. Lê Cao K. A. (2017a). mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 13:e1005752. doi: 10.1371/journal.pcbi.1005752, PMID: 29099853
Rosen G. L. Reichenberger E. R. Rosenfeld A. M. (2011). NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 27, 127–129. doi: 10.1093/bioinformatics/btq619, PMID: 21062764
Röttjers L. Vandeputte D. Raes J. Faust K. (2021). Null-model-based network comparison reveals core associations. ISME Commun. 1:36. doi: 10.1038/s43705-021-00036-w
Roux S. Faubladier M. Mahul A. Paulhe N. Bernard A. Debroas D. et al. (2011). Metavir: a web server dedicated to virome analysis. Bioinformatics 27, 3074–3075. doi: 10.1093/bioinformatics/btr519, PMID: 21911332
Russell D. J. Way S. F. Benson A. K. Sayood K. (2010). A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences. BMC Bioinform. 11:601. doi: 10.1186/1471-2105-11-601, PMID: 21167044
Sarker I. H. (2021). Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2:160. doi: 10.1007/s42979-021-00592-x, PMID: 33778771
Schloss P. D. Handelsman J. (2005). Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl. Environ. Microbiol. 71, 1501–1506. doi: 10.1128/AEM.71.3.1501-1506.2005, PMID: 15746353
Schloss P. D. Westcott S. L. (2011). Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Appl. Environ. Microbiol. 77, 3219–3226. doi: 10.1128/AEM.02810-10, PMID: 21421784
Schloss P. D. Westcott S. L. Ryabin T. Hall J. R. Hartmann M. Hollister E. B. et al. (2009). Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541. doi: 10.1128/AEM.01541-09, PMID: 19801464
Segata N. Izard J. Waldron L. Gevers D. Miropolsky L. Garrett W. S. et al. (2011). Metagenomic biomarker discovery and explanation. Genome Biol. 12:R60. doi: 10.1186/gb-2011-12-6-r60, PMID: 21702898
Shang J. Sun Y. (2021). CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning. Methods 189, 95–103. doi: 10.1016/j.ymeth.2020.05.018, PMID: 32454212
Sharpton T. J. (2014). An introduction to the analysis of shotgun metagenomic data. Front. Plant Sci. 5:209. doi: 10.3389/fpls.2014.00209, PMID: 24982662
Singh A. Shannon C. P. Gautier B. Rohart F. Vacher M. Tebbutt S. J. et al. (2019). DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics 35, 3055–3062. doi: 10.1093/bioinformatics/bty1054, PMID: 30657866
Sokol H. Leducq V. Aschard H. Pham H. P. Jegou S. Landman C. et al. (2017). Fungal microbiota dysbiosis in IBD. Gut 66, 1039–1048. doi: 10.1136/gutjnl-2015-310746, PMID: 26843508
Sommer M. J. Salzberg S. L. (2021). Balrog: a universal protein model for prokaryotic gene prediction. PLoS Comput. Biol. 17:e1008727. doi: 10.1371/journal.pcbi.1008727, PMID: 33635857
Soueidan H. Nikolski M. (2016). Machine learning for metagenomics: methods and tools. arXiv
Stunnenberg H. G. Hirst M. Abrignani S. Adams D. de Almeida M. Altucci L. et al. (2016). The international human epigenome consortium: a blueprint for scientific collaboration and discovery. Cells 167, 1145–1149. doi: 10.1016/j.cell.2016.11.007, PMID: 27863232
Sun Y. Cai Y. Liu L. Yu F. Farrell M. L. McKendree W. et al. (2009). ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences. Nucleic Acids Res. 37:e76. doi: 10.1093/nar/gkp285, PMID: 19417062
Tampuu A. Bzhalava Z. Dillner J. Vicente R. (2019). ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples. PLoS One 14:e0222271. doi: 10.1371/journal.pone.0222271, PMID: 31509583
Tanaseichuk O. Borneman J. Jiang T. (2014). Phylogeny-based classification of microbial communities. Bioinformatics 30, 449–456. doi: 10.1093/bioinformatics/btt700
The 1000 Genomes Project ConsortiumAuton A. Abecasis G. R. Altshuler D. M. Durbin R. M. Abecasis G. R. et al. (2015). A global reference for human genetic variation. Nature 526, 68–74. doi: 10.1038/nature15393, PMID: 26432245
The Human Microbiome Project Consortium (2012). Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214. doi: 10.1038/nature11234, PMID: 22699609
Thiele I. Heinken A. Fleming R. M. T. (2013). A systems biology approach to studying the role of microbes in human health. Curr. Opin. Biotechnol. 24, 4–12. doi: 10.1016/j.copbio.2012.10.001
Thomas A. M. Manghi P. Asnicar F. Pasolli E. Armanini F. Zolfo M. et al. (2019). Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat. Med. 25, 667–678. doi: 10.1038/s41591-019-0405-7, PMID: 30936548
Tibshirani R. (1996). Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288.
Topçuoğlu B. Lapp Z. Sovacool K. Snitkin E. Wiens J. Schloss P. (2021). mikropml: user-friendly R package for supervised machine learning pipelines. JOSS 6:3073. doi: 10.21105/joss.03073, PMID: 34414351
Uritskiy G. V. DiRuggiero J. Taylor J. (2018). MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6:158. doi: 10.1186/s40168-018-0541-1, PMID: 30219103
Wang Q. Garrity G. M. Tiedje J. M. Cole J. R. (2007). Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl. Environ. Microbiol. 73, 5261–5267. doi: 10.1128/AEM.00062-07, PMID: 17586664
Wang Z. Wang Z. Lu Y. Y. Sun F. Zhu S. (2019). SolidBin: improving metagenome binning with semi-supervised normalized cut. Bioinformatics 35, 4229–4238. doi: 10.1093/bioinformatics/btz253, PMID: 30977806
Wang X. Yao J. Sun Y. Mai V. (2013). M-pick, a modularity-based method for OTU picking of 16S rRNA sequences. BMC Bioinform. 14:43. doi: 10.1186/1471-2105-14-43, PMID: 23387433
Wei Z.-G. Zhang S.-W. (2015). MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs. Mol. BioSyst. 11, 1907–1913. doi: 10.1039/C5MB00089K, PMID: 25912934
Wei Z.-G. Zhang X. D. Cao M. Liu F. Qian Y. Zhang S. W. (2021). Comparison of methods for picking the operational taxonomic units from amplicon sequences. Front. Microbiol. 12:644012. doi: 10.3389/fmicb.2021.644012, PMID: 33841367
Wei Z.-G. Zhang S. W. Zhang Y. Z. (2017). DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences. QSAR Comb. Sci. 36:1600059. doi: 10.1002/minf.201600059, PMID: 28586119
Westcott S. L. Schloss P. D. Watson M. Pollard K. (2017). OptiClust, an improved method for assigning amplicon-based sequence data to operational taxonomic units. mSphere 2, e00073–e00017. doi: 10.1128/mSphereDirect.00073-17
White J. R. Navlakha S. Nagarajan N. Ghodsi M. R. Kingsford C. Pop M. (2010). Alignment and clustering of phylogenetic markers - implications for microbial diversity studies. BMC Bioinform. 11:152. doi: 10.1186/1471-2105-11-152, PMID: 20334679
Wirbel J. Zych K. Essex M. Karcher N. Kartal E. Salazar G. et al. (2021). Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol. 22:93. doi: 10.1186/s13059-021-02306-1, PMID: 33785070
Wu G. D. Chen J. Hoffmann C. Bittinger K. Chen Y. Y. Keilbaugh S. A. et al. (2011). Linking long-term dietary patterns with gut microbial enterotypes. Science 334, 105–108. doi: 10.1126/science.1208344, PMID: 21885731
Wu Y.-W. Simmons B. A. Singer S. W. (2016). MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607. doi: 10.1093/bioinformatics/btv638, PMID: 26515820
Yadav M. Chauhan N. S. (2022). Role of gut-microbiota in disease severity and clinical outcomes. Brief. Funct. Genomics. 24:elac037. doi: 10.1093/bfgp/elac037
Yang C. Chowdhury D. Zhang Z. Cheung W. K. Lu A. Bian Z. et al. (2021). A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput. Struct. Biotechnol. J. 19, 6301–6314. doi: 10.1016/j.csbj.2021.11.028, PMID: 34900140
Yang F. Zou Q. (2020). mAML: an automated machine learning pipeline with a microbiome repository for human disease classification. Database (Oxford) 2020:baaa050. doi: 10.1093/database/baaa050
Yin X. Altman T. Rutherford E. West K. A. Wu Y. Choi J. et al. (2020). A comparative evaluation of tools to predict metabolite profiles from microbiome sequencing data. Front. Microbiol. 11:595910. doi: 10.3389/fmicb.2020.595910, PMID: 33343536
Yu J. Feng Q. Wong S. H. Zhang D. Liang Q. Qin Y. et al. (2017). Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut 66, 70–78. doi: 10.1136/gutjnl-2015-309800, PMID: 26408641
Zhang J. Bajari R. Andric D. Gerthoffert F. Lepsa A. Nahal-Bose H. et al. (2019). The International cancer genome consortium data portal. Nat. Biotechnol. 37, 367–369. doi: 10.1038/s41587-019-0055-9
Zhang S.-W. Jin X. Y. Zhang T. (2017). Gene prediction in metagenomic fragments with deep learning. Biomed. Res. Int. 2017, 1–9. doi: 10.1155/2017/4740354
Zhang Z. Zhang L. (2021). METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs. BMC Bioinform. 22:378. doi: 10.1186/s12859-021-04284-4, PMID: 34294039
Zhang S.-W. Wei Z.-G. Zhou C. Zhang Y.-C. Zhang T.-H. (2013). “Exploring the interaction patterns in seasonal marine microbial communities with network analysis.” in 2013 7th International Conference on Systems Biology (ISB), Huangshan, China. pp. 63–68.
Zhao Z. Woloszynek S. Agbavor F. Mell J. C. Sokhansanj B. A. Rosen G. L. (2021). Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network. PLoS Comput. Biol. 17:e1009345. doi: 10.1371/journal.pcbi.1009345, PMID: 34550967
Zhu W. Lomsadze A. Borodovsky M. (2010). Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38:e132. doi: 10.1093/nar/gkq275, PMID: 20403810
Zou H. Hastie T. (2005). Regularization and Variable Selection Via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 301–320. doi: 10.1111/j.1467-9868.2005.00503.x