Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
All documents in ORBilu are protected by a user license.
Pathogenic microorganisms cause disease by invading, colonizing, and damaging their host. Virulence factors including bacterial toxins contribute to pathogenicity. Additionally, antimicrobial resistance genes allow pathogens to evade otherwise curative treatments. To understand causal relationships between microbiome compositions, functioning, and disease, it is essential to identify virulence factors and antimicrobial resistance genes in situ. At present, there is a clear lack of computational approaches to simultaneously identify these factors in metagenomic datasets.
Here, we present PathoFact, a tool for the contextualized prediction of virulence factors, bacterial toxins, and antimicrobial resistance genes with high accuracy (0.921, 0.832 and 0.979, respectively) and specificity (0.957, 0.989 and 0.994). We evaluate the performance of PathoFact on simulated metagenomic datasets and perform a comparison to two other general workflows for the analysis of metagenomic data. PathoFact outperforms all existing workflows in predicting virulence factors and toxin genes. It performs comparably to one pipeline regarding the prediction of antimicrobial resistance while outperforming the others. We further demonstrate the performance of PathoFact on three publicly available case-control metagenomic datasets representing an actual infection as well as chronic diseases in which either pathogenic potential or bacterial toxins are hypothesized to play a role. In each case, we identify virulence factors and AMR genes which differentiated between the case and control groups, thereby revealing novel gene associations with the studied diseases.
PathoFact is an easy-to-use, modular, and reproducible pipeline for the identification of virulence factors, bacterial toxins, and antimicrobial resistance genes in metagenomic data. Additionally, our tool combines the prediction of these pathogenicity factors with the identification of mobile genetic elements. This provides further depth to the analysis by considering the genomic context of the pertinent genes. Furthermore, PathoFact’s modules for virulence factors, toxins, and antimicrobial resistance genes can be applied independently, thereby making it a flexible and versatile tool. PathoFact, its models, and databases are freely available at https://pathofact.lcsb.uni.lu.