Article (Périodiques scientifiques)
SELECTION OF REPRESENTATIVE PROTEIN DATA SETS
HOBOHM, U.; SCHARF, M.; SCHNEIDER, Reinhard et al.
1992In Protein Science: A Publication of the Protein Society, 1 (3), p. 409-417
Peer reviewed vérifié par ORBi
 

Documents


Texte intégral
5560010313_ftp.pdf
Postprint Éditeur (1.06 MB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
NMR; PROTEIN DATA SETS; X-RAY CRYSTALLOGRAPHY
Résumé :
[en] The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv @ embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.
Centre de recherche :
- Luxembourg Centre for Systems Biomedicine (LCSB): Bioinformatics Core (R. Schneider Group)
Disciplines :
Biochimie, biophysique & biologie moléculaire
Identifiants :
UNILU:UL-ARTICLE-2012-018
Auteur, co-auteur :
HOBOHM, U.
SCHARF, M.
SCHNEIDER, Reinhard ;  European Molecular Biology Laboratory - EMBL
SANDER, C.
Langue du document :
Anglais
Titre :
SELECTION OF REPRESENTATIVE PROTEIN DATA SETS
Date de publication/diffusion :
1992
Titre du périodique :
Protein Science: A Publication of the Protein Society
ISSN :
0961-8368
eISSN :
1469-896X
Maison d'édition :
Cold Spring Harbor Laboratory Press, Woodbury, Etats-Unis - New York
Volume/Tome :
1
Fascicule/Saison :
3
Pagination :
409-417
Peer reviewed :
Peer reviewed vérifié par ORBi
Disponible sur ORBilu :
depuis le 30 juin 2014

Statistiques


Nombre de vues
194 (dont 4 Unilu)
Nombre de téléchargements
286 (dont 0 Unilu)

citations Scopus®
 
816
citations Scopus®
sans auto-citations
780
OpenCitations
 
616
citations OpenAlex
 
822
citations WoS
 
768

Bibliographie


Publications similaires



Contacter ORBilu