Results 121-138 of 138.
Bookmark and Share    
Peer Reviewed
See detailEXPLORING THE MYCOPLASMA-CAPRICOLUM GENOME - A MINIMAL CELL REVEALS ITS PHYSIOLOGY
BORK, P.; OUZOUNIS, C.; CASARI, G. et al

in Molecular Microbiology (1995), 16(5), 955-967

We report on the analysis of 214 kb of the parasitic eubacterium Mycoplasma capricolum sequenced by genomic walking techniques. The 287 putative proteins detected to date represent about half of the ... [more ▼]

We report on the analysis of 214 kb of the parasitic eubacterium Mycoplasma capricolum sequenced by genomic walking techniques. The 287 putative proteins detected to date represent about half of the estimated total number of 500 predicted for this organism. A large fraction of these (75%) can be assigned a likely function as a result of similarity searches. Several important features of the functional organization of this small genome are already apparent. Among these are (i) the expected relatively large number of enzymes involved in metabolic transport and activation, for efficient use of host cell nutrients; (ii) the presence of anabolic enzymes; (iii) the unexpected diversity of enzymes involved in DNA replication and repair; and (iv) a sizeable number of orthologues (82 so far) in Escherichia coil. This survey is beginning to provide a detailed view of how M. capricolum manages to maintain essential cellular processes with a genome much smaller than that of its bacterial relatives. [less ▲]

Detailed reference viewed: 128 (2 UL)
Full Text
Peer Reviewed
See detailCHALLENGING TIMES FOR BIOINFORMATICS
CASARI, G.; ANDRADE, M. A.; BORK, P. et al

in Nature (1995), 376(6542), 647-648

Detailed reference viewed: 155 (6 UL)
See detailSequenz und Sequenz-Struktur Vergleiche und deren Anwendung für die Struktur- und Funktionsvorhersage von Proteinen
Schneider, Reinhard UL

Doctoral thesis (1994)

Zusammenfassung der Inaugural-Dissertation Name: Reinhard Schneider Titel: Sequenz und Sequenz-Struktur Vergleiche und deren Anwendung für die Struktur- und Funktionsvorhersage von Proteinen Betreuer ... [more ▼]

Zusammenfassung der Inaugural-Dissertation Name: Reinhard Schneider Titel: Sequenz und Sequenz-Struktur Vergleiche und deren Anwendung für die Struktur- und Funktionsvorhersage von Proteinen Betreuer: Prof. Dr. K. C. Holmes (MPI für medizinische Forschung, Heidelberg) Durch die sogenannten Genomprojekte wird es in den nächsten Jahren zu einer enormen Vergrößerung der biologischen Sequenzdatenbanken kommen. Eine unabdingbare Voraussetzung zur Nutzung dieses Rohmaterials stellt dabei die Analyse dieser Sequenzdaten mit Hilfe rechnergestützter Methoden dar. Eines der Hauptanwendungsgebiete von Rechnern für die Funktions- und Strukturvorhersage von Proteinen werden dabei selektive Datenbanksuche nach biologisch signifikanten Ähnlichkeiten sein. Zur Signifikanzabschätzung eines Proteinsequenzvergleiches (Alignment) wurde ein empirisch abgeleiteter Homologieschwellenwert definiert. Wichtigstes Merkmal ist dabei eine starke Abhängigkeit von der Länge des betreffenden Alignments. Diese Signifikanzabschätzung ermöglicht sowohl den Ausschluß von nicht verwandten Proteinen, wie auch die Detektion von schwachen Sequenzverwandtschaften. Aufgrund der Allgemeingültigkeit des Homologieschwellenwertes kann er als einfacher und effizienter zusätzlicher Filter für andere Methoden, wie z.B. schnelle Datenbanksuchen, verwendet werden. Es wurde ein neuer Algorithmus für den multiplen Sequenzvergleich entwickelt, der eine relativ geringe rechnerische Komplexität besitzt. Das Hauptmerkmal dieses Algorithmus besteht in der Ableitung von sogenannten positionsabhängigen Konservierungsgewichten, die als zusätzliche Parameter im dynamischen Programmieralgorithmus verwendet werden und zu einer deutlich gesteigerten Sensitivität bei Datenbanksuchen führt. Die programmiertechnische Auslegung des Algorithmus erlaubt die zukünftige Erweiterung auf den Vergleich einer Sequenz gegen ein Sequenzprofil bzw. den Vergleich zweier Sequenzprofile. Um auch zukünftig sensitive Datenbanksuchen in einer vertretbaren Rechenzeit durchführen zu können,, wurde das Programm auf parallele Rechner portiert. Die Ergebnisse zeigen, daß mit den heute verfügbaren massiv parallelen Rechnern ein beinahe interaktives Arbeiten möglich ist. Aufbauend auf dieser Arbeit wird derzeit im Rahmen eines europäischen Projektes die Implementierung der Profilmethoden auf Parallelrechner der neuesten Generation durchgeführt und der Nutzen für das industrielle “Protein design” bestimmt. Mit Hilfe des Homologieschwellenwertes konnte eine Datenbank für homologie-abgeleitete Proteinstrukturen (HSSP) entwickelt werden. Diese Datenbank wird der Öffentlichkeit auf verschiedenen Wegen zugänglich gemacht und hat sich als ein gewisser Standard etabliert. Die Datenbank findet dabei Verwendung im automatisierten dreidimensionalen Modellbau von Proteinstrukturen, sowie als Hilfsmittel und Datengrundlage für ein weitgestecktes Feld von statistischen und anderen theoretischen Arbeiten. Die Verwendung der Datenbank hat einen entscheidenden Beitrag bei der Entwicklung des derzeit besten Programms zur Vorhersage der Sekundärstruktur von Proteinen geleistet. Diese Vorhersagemethode basiert auf einem neuronalen Netzwerk, das die Informationen eines multiplen Sequenzvergleichs ausnutzt. Zur Berechnung der multiplen Sequenzalignments und der dazu notwendigen Datenbanksuche wird das in dieser Arbeit entwickelte Programm verwendet. Die Methode wurde in Form eines Vorhersagedienstes, der über internationale Datenleitungen verfügbar ist, der Öffentlichkeit zugänglich gemacht. Einen neuen Ansatz für die Vorhersage von Proteinstruktur bei fehlender Sequenzverwandtschaft zu einer bereits bekannten Struktur stellt die Methode für das Sequenz-Strukturalignment (“threading”) dar. Dazu wird eine dreidimensionale Struktur in Form von interatomaren Kontakten beschrieben und mit Hilfe von Präferenzparametern die Tauglichkeit einer Sequenz in eine Struktur bewertet. Die Ergebnisse zeigen, daß sowohl eine Verbesserung der abstrahierten Beschreibung für eine dreidimensionale Proteinstruktur, wie auch ein verbesserter Alignmentalgorithmus notwendig ist. Ein in der Praxis vielversprechender Ansatz ist die Verwendung von Methoden, die einerseits eine abstrahierte 3D-Beschreibung zulassen und zusätzlich einen gewissen Grad an Sequenzinformation, etwa in Form eines Sequenzprofils mit einbeziehen. Am Beispiel eines kompletten Chromosoms aus Hefe wurde eine komplexe funktionelle Genomanalyse durchgeführt. Hierbei konnten eine Reihe von biologisch interessanten Sequenzverwandtschaften aufgedeckt werden, waren jedoch mit einem hohen Arbeitsaufwand verbunden. Dabei stellte sich die ungenügende Integration der vorhanden Methoden und heterogenen Datenbanken als Hauptproblem heraus. Die dabei gesammelten Erfahrungen fließen derzeit in die Entwicklung eines integrierten Softwarepackets ein, mit dessen Hilfe es möglich sein wird, den Arbeitsaufwand, der zur Analyse von großen Datenmengen notwendig ist, drastisch zu reduzieren. [less ▲]

Detailed reference viewed: 174 (2 UL)
Peer Reviewed
See detailTHE HSSP DATABASE OF PROTEIN-STRUCTURE SEQUENCE ALIGNMENTS
SANDER, C.; Schneider, Reinhard UL

in Nucleic Acids Research (1994), 22(17), 3597-3599

HSSP (homology-derived structures of proteins) is a derived database merging structural (2-D and 3-D) and sequence information (1-D). For each protein of known 3D structure from the Protein Data Bank, the ... [more ▼]

HSSP (homology-derived structures of proteins) is a derived database merging structural (2-D and 3-D) and sequence information (1-D). For each protein of known 3D structure from the Protein Data Bank, the database has a file with all sequence homologues, properly aligned to the PDB protein. Homologues are very likely to have the same 3D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of sequence aligned sequence families, but it is also a database of implied secondary and tertiary structures. [less ▲]

Detailed reference viewed: 137 (0 UL)
Peer Reviewed
See detailREDEFINING THE GOALS OF PROTEIN SECONDARY STRUCTURE PREDICTION
ROST, B.; SANDER, C.; Schneider, Reinhard UL

in Journal of Molecular Biology (1994), 235(1), 13-26

Detailed reference viewed: 141 (0 UL)
Full Text
Peer Reviewed
See detailPHD - AN AUTOMATIC MAIL SERVER FOR PROTEIN SECONDARY STRUCTURE PREDICTION
ROST, B.; SANDER, C.; Schneider, Reinhard UL

in Computer Applications in the Biosciences [=CABIOS] (1994), 10(1), 53-60

By the middle of 1993, > 30000 protein sequences had been listed. For 1000 of these, the three-dimensional (tertiary) structure has been experimentally solved. Another 7000 can be modelled by homology ... [more ▼]

By the middle of 1993, > 30000 protein sequences had been listed. For 1000 of these, the three-dimensional (tertiary) structure has been experimentally solved. Another 7000 can be modelled by homology. For the remaining 21000 sequences, secondary structure prediction provides a rough estimate of structural features. Predictions in three states range between 35% (random) and 88% (homology modelling) overall accuracy. Using information about evolutionary conservation as contained in multiple sequence alignments, the secondary structure of 4700 protein sequences was predicted by the automatic e-mail sewer PHD, For proteins with at least one known homologue, the method has an expected overall three-state accuracy of 71.4% for proteins with at least one known homologue (evaluated on 126 unique protein chains). [less ▲]

Detailed reference viewed: 146 (2 UL)
Peer Reviewed
See detailEvolution and Neural Networks – Protein Secondary Structure Prediction Above 71% Accuracy
Rost, B.; Sander, C.; Schneider, Reinhard UL

in Proceedings of the 27th Hawaii International Conference on System Sciences, Vol. V, Biotechnology Computing (1994)

Detailed reference viewed: 69 (0 UL)
Peer Reviewed
See detailCorrelated mutations and residue contacts in proteins
Göbel, U.; Sander, C.; Schneider, Reinhard UL et al

in Proteins (1994), 18

The maintenance of protein function and structure constrains the evolution of amino acid sequences. This fact can be exploited to interpret correlated mutations observed in a sequence family as an ... [more ▼]

The maintenance of protein function and structure constrains the evolution of amino acid sequences. This fact can be exploited to interpret correlated mutations observed in a sequence family as an indication of probable physical contact in three dimensions. Here we present a simple and general method to analyze correlations in mutational behavior between different positions in a multiple sequence alignment. We then use these correlations to predict contact maps for each of 11 protein families and compare the result with the contacts determined by crystallography. For the most strongly correlated residue pairs predicted to be in contact, the prediction accuracy ranges from 37 to 68% and the improvement ratio relative to a random prediction from 1.4 to 5.1. Predicted contact maps can be used as input for the calculation of protein tertiary structure, either from sequence information alone or in combination with experimental information. [less ▲]

Detailed reference viewed: 183 (3 UL)
Full Text
Peer Reviewed
See detailGeneQuiz: a workbench for sequence analysis
Scharf, M.; Schneider, Reinhard UL; Casari, G. et al

in Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, ISMB-94 (1994)

We present the prototype of a software system, called GeneQuiz, for large-scale biological sequence analysis. The system was designed to meet the needs that arise in computational sequence analysis and ... [more ▼]

We present the prototype of a software system, called GeneQuiz, for large-scale biological sequence analysis. The system was designed to meet the needs that arise in computational sequence analysis and our past experience with the analysis of 171 protein sequences of yeast chromosome III. We explain the cognitive challenges associated with this particular research activity and present our model of the sequence analysis process. The prototype system consists of two parts: (i) the database update and search system (driven by perl programs and rdb, a simple relational database engine also written in perl) and (ii) the visualization and browsing system (developed under C++/ET++). The principal design requirement for the first part was the complete automation of all repetitive actions: database updates, efficient sequence similarity searches and sampling of results in a uniform fashion. The user is then presented with "hit-lists" that summarize the results from heterogeneous database searches. The expert's primary task now simply becomes the further analysis of the candidate entries, where the problem is to extract adequate information about functional characteristics of the query protein rapidly. This second task is tremendously accelerated by a simple combination of the heterogeneous output into uniform relational tables and the provision of browsing mechanisms that give access to database records, sequence entries and alignment views. Indexing of molecular sequence databases provides fast retrieval of individual entries with the use of unique identifiers as well as browsing through databases using pre-existing cross-references. The presentation here covers an overview of the architecture of the system prototype and our experiences on its applicability in sequence analysis. [less ▲]

Detailed reference viewed: 58 (0 UL)
Peer Reviewed
See detailFrom Sequence Similarity to Structural Homology of Proteins
Sander, C.; Schneider, Reinhard UL

in Computation of Biomolecular Structures, Achievements, Problems and Perspectives (1993)

Detailed reference viewed: 105 (0 UL)
See detailMaxHom
Sander, C.; Schneider, Reinhard UL

in The ZEUS Consortium Massively Parallel Computing, Technical Report PC2 / TR-006-94 (1993)

Detailed reference viewed: 84 (0 UL)
Full Text
Peer Reviewed
See detailTHE HSSP DATA-BASE OF PROTEIN STRUCTURE-SEQUENCE ALIGNMENTS
SANDER, C.; Schneider, Reinhard UL

in Nucleic Acids Research (1993), 21(13), 3105-3109

Detailed reference viewed: 115 (0 UL)
Full Text
Peer Reviewed
See detailPROGRESS IN PROTEIN-STRUCTURE PREDICTION
ROST, B.; Schneider, Reinhard UL; SANDER, C.

in Trends in Biochemical Sciences - Regular Edition (1993), 18(4), 120-123

Prediction of protein secondary structure is an old problem and progress has been slow. Recently, spectacular success has been claimed in the blind prediction of the catalytic subunit of the cAMP ... [more ▼]

Prediction of protein secondary structure is an old problem and progress has been slow. Recently, spectacular success has been claimed in the blind prediction of the catalytic subunit of the cAMP-dependent protein kinase. When predictions in this and other test cases are assessed critically, some claims of prediction success turn out to be exaggerated, but a kernel of real progress remains: protein structure prediction can be improved substantially when a family of related sequences is available. Enough so that molecular biologists equipped with a new amino acid sequence and a multiple sequence alignment in hand may be tempted to test the new prediction methods. [less ▲]

Detailed reference viewed: 125 (0 UL)
Full Text
Peer Reviewed
See detailPREDICTION OF PROTEIN-STRUCTURE BY EVALUATION OF SEQUENCE-STRUCTURE FITNESS - ALIGNING SEQUENCES TO CONTACT PROFILES DERIVED FROM 3-DIMENSIONAL STRUCTURES
OUZOUNIS, C.; SANDER, C.; SCHARF, M. et al

in Journal of Molecular Biology (1993), 232(3), 805-825

Detailed reference viewed: 68 (0 UL)
Full Text
Peer Reviewed
See detailSELECTION OF REPRESENTATIVE PROTEIN DATA SETS
HOBOHM, U.; SCHARF, M.; Schneider, Reinhard UL et al

in Protein Science : A Publication of the Protein Society (1992), 1(3), 409-417

The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many ... [more ▼]

The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv @ embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures. [less ▲]

Detailed reference viewed: 122 (2 UL)
Full Text
Peer Reviewed
See detailCOMPREHENSIVE SEQUENCE-ANALYSIS OF THE 182 PREDICTED OPEN READING FRAMES OF YEAST CHROMOSOME-III
BORK, P.; OUZOUNIS, C.; SANDER, C. et al

in Protein Science : A Publication of the Protein Society (1992), 1(12), 1677-1690

With the completion of the first phase of the European yeast genome sequencing project, the complete DNA sequence of chromosome III of Saccharomyces cerevisiae has become available (Oliver, S.G., et al ... [more ▼]

With the completion of the first phase of the European yeast genome sequencing project, the complete DNA sequence of chromosome III of Saccharomyces cerevisiae has become available (Oliver, S.G., et al., 1992, Nature 357, 38-46). We have tested the predictive power of computer sequence analysis on the 176 probable protein products of this chromosome, after exclusion of six problem cases. When the results of database similarity searches are pooled with prior knowledge, a likely function can be assigned to 42% of the proteins, and a predicted three-dimensional structure to a third of these (140% of the total). The function of the remaining 58% remains to be determined. Of these, about one-third have one or more probable transmembrane segments. Among the most interesting proteins with predicted functions are a new member of the type X polymerase family, a transcription factor with an N-terminal DNA-binding domain related to GAL4, a ''fork head'' DNA-binding domain previously known only in Drosophila and in mammals, and a putative methyltransferase. Our analysis increased the number of known significant sequence similarities on chromosome III by 13, to now 67. Although the near 40% success rate of identifying unknown protein function by sequence analysis is surprisingly high, the information gap between known protein sequences and unknown function is expected to widen and become a major bottleneck of genome projects in the near future. Based on the experience gained in this test study, we suggest that the development of an automated computer workbench for protein sequence analysis must be an important item in genome projects. [less ▲]

Detailed reference viewed: 131 (0 UL)
Full Text
Peer Reviewed
See detailWHATS IN A GENOME
BORK, P.; OUZOUNIS, C.; SANDER, C. et al

in Nature (1992), 358(6384), 287-287

Detailed reference viewed: 151 (2 UL)
Full Text
Peer Reviewed
See detailDATABASE OF HOMOLOGY-DERIVED PROTEIN STRUCTURES AND THE STRUCTURAL MEANING OF SEQUENCE ALIGNMENT
SANDER, C.; Schneider, Reinhard UL

in Proteins (1991), 9(1), 56-68

The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences ... [more ▼]

The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology. [less ▲]

Detailed reference viewed: 396 (1 UL)