References of "SCHARF, M."
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailGeneQuiz: a workbench for sequence analysis
Scharf, M.; Schneider, Reinhard UL; Casari, G. et al

in Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, ISMB-94 (1994)

We present the prototype of a software system, called GeneQuiz, for large-scale biological sequence analysis. The system was designed to meet the needs that arise in computational sequence analysis and ... [more ▼]

We present the prototype of a software system, called GeneQuiz, for large-scale biological sequence analysis. The system was designed to meet the needs that arise in computational sequence analysis and our past experience with the analysis of 171 protein sequences of yeast chromosome III. We explain the cognitive challenges associated with this particular research activity and present our model of the sequence analysis process. The prototype system consists of two parts: (i) the database update and search system (driven by perl programs and rdb, a simple relational database engine also written in perl) and (ii) the visualization and browsing system (developed under C++/ET++). The principal design requirement for the first part was the complete automation of all repetitive actions: database updates, efficient sequence similarity searches and sampling of results in a uniform fashion. The user is then presented with "hit-lists" that summarize the results from heterogeneous database searches. The expert's primary task now simply becomes the further analysis of the candidate entries, where the problem is to extract adequate information about functional characteristics of the query protein rapidly. This second task is tremendously accelerated by a simple combination of the heterogeneous output into uniform relational tables and the provision of browsing mechanisms that give access to database records, sequence entries and alignment views. Indexing of molecular sequence databases provides fast retrieval of individual entries with the use of unique identifiers as well as browsing through databases using pre-existing cross-references. The presentation here covers an overview of the architecture of the system prototype and our experiences on its applicability in sequence analysis. [less ▲]

Detailed reference viewed: 65 (0 UL)
Full Text
Peer Reviewed
See detailPREDICTION OF PROTEIN-STRUCTURE BY EVALUATION OF SEQUENCE-STRUCTURE FITNESS - ALIGNING SEQUENCES TO CONTACT PROFILES DERIVED FROM 3-DIMENSIONAL STRUCTURES
OUZOUNIS, C.; SANDER, C.; SCHARF, M. et al

in Journal of Molecular Biology (1993), 232(3), 805-825

Detailed reference viewed: 82 (0 UL)
Full Text
Peer Reviewed
See detailCOMPREHENSIVE SEQUENCE-ANALYSIS OF THE 182 PREDICTED OPEN READING FRAMES OF YEAST CHROMOSOME-III
BORK, P.; OUZOUNIS, C.; SANDER, C. et al

in Protein Science: A Publication of the Protein Society (1992), 1(12), 1677-1690

With the completion of the first phase of the European yeast genome sequencing project, the complete DNA sequence of chromosome III of Saccharomyces cerevisiae has become available (Oliver, S.G., et al ... [more ▼]

With the completion of the first phase of the European yeast genome sequencing project, the complete DNA sequence of chromosome III of Saccharomyces cerevisiae has become available (Oliver, S.G., et al., 1992, Nature 357, 38-46). We have tested the predictive power of computer sequence analysis on the 176 probable protein products of this chromosome, after exclusion of six problem cases. When the results of database similarity searches are pooled with prior knowledge, a likely function can be assigned to 42% of the proteins, and a predicted three-dimensional structure to a third of these (140% of the total). The function of the remaining 58% remains to be determined. Of these, about one-third have one or more probable transmembrane segments. Among the most interesting proteins with predicted functions are a new member of the type X polymerase family, a transcription factor with an N-terminal DNA-binding domain related to GAL4, a ''fork head'' DNA-binding domain previously known only in Drosophila and in mammals, and a putative methyltransferase. Our analysis increased the number of known significant sequence similarities on chromosome III by 13, to now 67. Although the near 40% success rate of identifying unknown protein function by sequence analysis is surprisingly high, the information gap between known protein sequences and unknown function is expected to widen and become a major bottleneck of genome projects in the near future. Based on the experience gained in this test study, we suggest that the development of an automated computer workbench for protein sequence analysis must be an important item in genome projects. [less ▲]

Detailed reference viewed: 145 (1 UL)
Full Text
Peer Reviewed
See detailSELECTION OF REPRESENTATIVE PROTEIN DATA SETS
HOBOHM, U.; SCHARF, M.; Schneider, Reinhard UL et al

in Protein Science: A Publication of the Protein Society (1992), 1(3), 409-417

The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many ... [more ▼]

The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv @ embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures. [less ▲]

Detailed reference viewed: 133 (3 UL)
Full Text
Peer Reviewed
See detailWHATS IN A GENOME
BORK, P.; OUZOUNIS, C.; SANDER, C. et al

in Nature (1992), 358(6384), 287-287

Detailed reference viewed: 159 (2 UL)