Reference : A procedure to recruit members to enlarge protein family databases--the building of U...
Scientific journals : Article
Life sciences : Biotechnology
Systems Biomedicine
A procedure to recruit members to enlarge protein family databases--the building of UECOG (UniRef-Enriched COG Database) as a model.
Fernandes, G. R. [> >]
Barbosa Da Silva, Adriano mailto [University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > >]
Prosdocimi, F. [> >]
Pena, I. A. [> >]
Santana-Santos, L. [> >]
Coelho Junior, O. [> >]
Barbosa-Silva, A. [> >]
Velloso, H. M. [> >]
Mudado, M. A. [> >]
Natale, D. A. [> >]
Faria-Campos, A. C. [> >]
Aguiar, S. C. V. [> >]
Ortega, J. M. [> >]
Genetics and Molecular Research
Yes (verified by ORBilu)
[en] Computational Biology/methods ; Databases, Protein ; Reproducibility of Results
[en] A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species.

File(s) associated to this reference

Fulltext file(s):

Open access
uecog.pdfPublisher postprint866.77 kBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.