![]() ![]() ; Schneider, Reinhard ![]() in Nucleic Acids Research (1998), 26(1), 313-315 HSSP (http://www.sander.embl-ebi.ac.uk/hssp/) is a derived database merging structure (3-D) and sequence (1-D) information, For each protein of known 3D structure from the Protein Data Bank (PDB), we ... [more ▼] HSSP (http://www.sander.embl-ebi.ac.uk/hssp/) is a derived database merging structure (3-D) and sequence (1-D) information, For each protein of known 3D structure from the Protein Data Bank (PDB), we provide a multiple sequence alignment of putative homologues and a sequence profile characteristic of the protein family, centered on the known structure. The list of homologues is the result of an iterative database search in SWISS-PROT using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently, The listed putative homologues are very likely to have the same 3D structure as the PDB protein to which they have been aligned. As a result, the database not only provides aligned sequence families, but also implies secondary and tertiary structures covering 33% of all sequences in SWISS-PROT. [less ▲] Detailed reference viewed: 128 (0 UL)![]() ; Schneider, Reinhard ![]() in Journal of Molecular Biology (1997), 270(3), 471-480 In fold recognition by threading one takes the amino acid sequence of a protein and evaluates how well it fits into one of the known three-dimensional (3D) protein structures. The quality of sequence ... [more ▼] In fold recognition by threading one takes the amino acid sequence of a protein and evaluates how well it fits into one of the known three-dimensional (3D) protein structures. The quality of sequence-structure fit is typically evaluated using inter-residue potentials of mean force or other statistical parameters. Here, we present an alternative approach to evaluating sequence-structure fitness. Starting from the amino acid sequence we first predict secondary structure and solvent accessibility for each residue. We then thread the resulting one-dimensional (1D) profile of predicted structure assignments into each of the known 3D structures. The optimal threading for each sequence-structure pair is obtained using dynamic programming. The overall best sequence-structure pair constitutes the predicted 3D structure for the input sequence. The method is fine-tuned by adding information from direct sequence-sequence comparison and applying a series of empirical filters. Although the method relies on reduction of 3D information into 1D structure profiles, its accuracy is, surprisingly, not clearly inferior to methods based on evaluation of residue interactions in 3D. We therefore hypothesise that existing 1D-3D threading methods essentially do not capture more than the fitness of an amino acid sequence for a particular 1D succession of secondary structure segments and residue solvent accessibility. The prediction-based threading method on average finds any structurally homologous region at first rank in 29% of the cases (including sequence information). For the 22% first hits detected at highest scores, the expected accuracy rose to 75%. However, the task of detecting entire folds rather than homologous fragments was managed much better; 45 to 75% of the first hits correctly recognised the fold. [less ▲] Detailed reference viewed: 172 (0 UL)![]() ; ; et al in Computer Applications in the Biosciences [=CABIOS] (1997), 13(4), 481-483 Detailed reference viewed: 122 (1 UL)![]() ; ; et al in Journal of Yeast and Fungal Research (1997), 13(14), 1363-1374 We have analysed short open reading frames (between 150 and 300 base pairs long) of the yeast genome (Saccharomyces cerevisiae) with a two-step strategy. The first step selects a candidate set of open ... [more ▼] We have analysed short open reading frames (between 150 and 300 base pairs long) of the yeast genome (Saccharomyces cerevisiae) with a two-step strategy. The first step selects a candidate set of open reading frames from the DNA. sequence based on statistical evaluation of DNA and protein sequence properties. The second step filters the candidate set by selecting open reading frames with high similarity to other known sequences (from any organism). As a result, we report ten new predicted proteins not present in the current sequence databases. These include a new alcohol dehydrogenase, a protein probably related to the cell cycle, as well as a homolog of the prokaryotic ribosomal protein L36 likely to be a mitochondrial ribosomal protein coded in the nuclear genome. We conclude that the analysis of short open reading frames leads to biologically interesting discoveries, even though the quantitative yield of new proteins is relatively low. [less ▲] Detailed reference viewed: 121 (0 UL)![]() ; Schneider, Reinhard ![]() in WWW-publication (1997) Over the past few years our means of communication have changed rapidly due to the growth of the World Wide Web (WWW). The Web enables molecular biologists to immediately access databases, scan literature ... [more ▼] Over the past few years our means of communication have changed rapidly due to the growth of the World Wide Web (WWW). The Web enables molecular biologists to immediately access databases, scan literature, find information about related research and researchers, and to trace cell cultures. Wet-lab biologists can uncover information about the protein of interest without having to become experts in sequence analysis. Here, we present a variety of tools; provide an overview of the state-of-the art in sequence analysis; and described some of the principles of the methods. [less ▲] Detailed reference viewed: 75 (2 UL)![]() ![]() Schneider, Reinhard ![]() in Nucleic Acids Research (1997), 25(1), 226-230 HSSP is a derived database merging structural (3-D) and sequence (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment ... [more ▼] HSSP is a derived database merging structural (3-D) and sequence (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in SwissProt using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 29% of all SwissProt-stored sequences. [less ▲] Detailed reference viewed: 128 (0 UL)![]() ; ; et al in Trends in Genetics (1996), 128(7), 244-245 Scientific history was made in completing the yeast genuine sequence, yet its 13 Mb are a mere starting point. Two challenges loom large: to decipher the function of all genes and to describe the workings ... [more ▼] Scientific history was made in completing the yeast genuine sequence, yet its 13 Mb are a mere starting point. Two challenges loom large: to decipher the function of all genes and to describe the workings of the eukaryotic cell in full molecular detail. A combination of experimental and theoretical approaches will be brought to bear on these challenges. What will be next in yeast genome analysis from the point of view of bioinformatics? [less ▲] Detailed reference viewed: 165 (0 UL)![]() ![]() Schneider, Reinhard ![]() in Supercomputer 96: Anwendungen, Architekturen, Trends (1996) Analyzing genomic data is a computationally intensive and complicated process in which scientists must typically choose among multiple databases and analysis methods and make expert judgements inspecting ... [more ▼] Analyzing genomic data is a computationally intensive and complicated process in which scientists must typically choose among multiple databases and analysis methods and make expert judgements inspecting multiple results. GeneQuiz, an automated software system for large scale genome analysis developed at the EMBL/EBI, tackles this problem by using an automated, rigorous, rule-based system to select among the results of sequence analysis and database searches, builds informative annotation and aims at predicting the function of new genes. In a demonstration project more than 6000 proteins from the Baker’s yeast, for which the complete genomic sequence was completed in 1996, were analyzed on a Silicon Graphics POWERCHALLENGEarray with 64 processors (R8000 @90 MHz) so that the analysis could be completed in 3 days. The results of the analysis were published on two web servers as they were computed. [less ▲] Detailed reference viewed: 117 (4 UL)![]() ![]() Schneider, Reinhard ![]() in Nucleic Acids Research (1996), 24(1), 201-205 HSSP is a derived database merging structural three dimensional (3-D) and sequence one dimensional (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database ... [more ▼] HSSP is a derived database merging structural three dimensional (3-D) and sequence one dimensional (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in Swissprot using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 27% of all Swissprotstored sequences. [less ▲] Detailed reference viewed: 121 (0 UL)![]() ![]() Schneider, Reinhard ![]() in Statustagung des BMBF, HPSC 95, Stand und Perspektiven des Parallelen Höchstleistungsrechnens und seiner Anwendungen (1995) Sequence comparison has become an essential and standard tool in the analysis of genomic data. Genome projects will decipher much of the genetic information in many organisms, including humans. As a ... [more ▼] Sequence comparison has become an essential and standard tool in the analysis of genomic data. Genome projects will decipher much of the genetic information in many organisms, including humans. As a result, the computational cost of databank searches will increase dramatically. In addition, the search for biologically meaningful homology between a newly determined sequence and sequences already stored in the various databanks becomes increasingly important as most of the new data will be in raw, not understood form. The detection of sufficient similarity between a newly determined sequence to a protein of know function or even known 3D-structure in a databank allows one to transfer most of the knowledge from one sequence to the other. The result can be enormous savings in genetic and biochemical laboratory efforts. [less ▲] Detailed reference viewed: 84 (7 UL)![]() ![]() ; ; et al in Molecular Microbiology (1995), 16(5), 955-967 We report on the analysis of 214 kb of the parasitic eubacterium Mycoplasma capricolum sequenced by genomic walking techniques. The 287 putative proteins detected to date represent about half of the ... [more ▼] We report on the analysis of 214 kb of the parasitic eubacterium Mycoplasma capricolum sequenced by genomic walking techniques. The 287 putative proteins detected to date represent about half of the estimated total number of 500 predicted for this organism. A large fraction of these (75%) can be assigned a likely function as a result of similarity searches. Several important features of the functional organization of this small genome are already apparent. Among these are (i) the expected relatively large number of enzymes involved in metabolic transport and activation, for efficient use of host cell nutrients; (ii) the presence of anabolic enzymes; (iii) the unexpected diversity of enzymes involved in DNA replication and repair; and (iv) a sizeable number of orthologues (82 so far) in Escherichia coil. This survey is beginning to provide a detailed view of how M. capricolum manages to maintain essential cellular processes with a genome much smaller than that of its bacterial relatives. [less ▲] Detailed reference viewed: 136 (2 UL)![]() ; ; et al in Nature (1995), 376(6542), 647-648 Detailed reference viewed: 166 (6 UL)![]() ![]() ; Schneider, Reinhard ![]() in Nucleic Acids Research (1994), 22(17), 3597-3599 HSSP (homology-derived structures of proteins) is a derived database merging structural (2-D and 3-D) and sequence information (1-D). For each protein of known 3D structure from the Protein Data Bank, the ... [more ▼] HSSP (homology-derived structures of proteins) is a derived database merging structural (2-D and 3-D) and sequence information (1-D). For each protein of known 3D structure from the Protein Data Bank, the database has a file with all sequence homologues, properly aligned to the PDB protein. Homologues are very likely to have the same 3D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of sequence aligned sequence families, but it is also a database of implied secondary and tertiary structures. [less ▲] Detailed reference viewed: 148 (0 UL)![]() ![]() ; ; Schneider, Reinhard ![]() in Journal of Molecular Biology (1994), 235(1), 13-26 Detailed reference viewed: 150 (0 UL)![]() ; Schneider, Reinhard ![]() in Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, ISMB-94 (1994) We present the prototype of a software system, called GeneQuiz, for large-scale biological sequence analysis. The system was designed to meet the needs that arise in computational sequence analysis and ... [more ▼] We present the prototype of a software system, called GeneQuiz, for large-scale biological sequence analysis. The system was designed to meet the needs that arise in computational sequence analysis and our past experience with the analysis of 171 protein sequences of yeast chromosome III. We explain the cognitive challenges associated with this particular research activity and present our model of the sequence analysis process. The prototype system consists of two parts: (i) the database update and search system (driven by perl programs and rdb, a simple relational database engine also written in perl) and (ii) the visualization and browsing system (developed under C++/ET++). The principal design requirement for the first part was the complete automation of all repetitive actions: database updates, efficient sequence similarity searches and sampling of results in a uniform fashion. The user is then presented with "hit-lists" that summarize the results from heterogeneous database searches. The expert's primary task now simply becomes the further analysis of the candidate entries, where the problem is to extract adequate information about functional characteristics of the query protein rapidly. This second task is tremendously accelerated by a simple combination of the heterogeneous output into uniform relational tables and the provision of browsing mechanisms that give access to database records, sequence entries and alignment views. Indexing of molecular sequence databases provides fast retrieval of individual entries with the use of unique identifiers as well as browsing through databases using pre-existing cross-references. The presentation here covers an overview of the architecture of the system prototype and our experiences on its applicability in sequence analysis. [less ▲] Detailed reference viewed: 65 (0 UL)![]() ![]() ; ; Schneider, Reinhard ![]() in Proceedings of the 27th Hawaii International Conference on System Sciences, Vol. V, Biotechnology Computing (1994) Detailed reference viewed: 77 (0 UL)![]() ![]() ; ; Schneider, Reinhard ![]() in Proteins (1994), 18 The maintenance of protein function and structure constrains the evolution of amino acid sequences. This fact can be exploited to interpret correlated mutations observed in a sequence family as an ... [more ▼] The maintenance of protein function and structure constrains the evolution of amino acid sequences. This fact can be exploited to interpret correlated mutations observed in a sequence family as an indication of probable physical contact in three dimensions. Here we present a simple and general method to analyze correlations in mutational behavior between different positions in a multiple sequence alignment. We then use these correlations to predict contact maps for each of 11 protein families and compare the result with the contacts determined by crystallography. For the most strongly correlated residue pairs predicted to be in contact, the prediction accuracy ranges from 37 to 68% and the improvement ratio relative to a random prediction from 1.4 to 5.1. Predicted contact maps can be used as input for the calculation of protein tertiary structure, either from sequence information alone or in combination with experimental information. [less ▲] Detailed reference viewed: 199 (3 UL)![]() ; ; Schneider, Reinhard ![]() in Computer Applications in the Biosciences [=CABIOS] (1994), 10(1), 53-60 By the middle of 1993, > 30000 protein sequences had been listed. For 1000 of these, the three-dimensional (tertiary) structure has been experimentally solved. Another 7000 can be modelled by homology ... [more ▼] By the middle of 1993, > 30000 protein sequences had been listed. For 1000 of these, the three-dimensional (tertiary) structure has been experimentally solved. Another 7000 can be modelled by homology. For the remaining 21000 sequences, secondary structure prediction provides a rough estimate of structural features. Predictions in three states range between 35% (random) and 88% (homology modelling) overall accuracy. Using information about evolutionary conservation as contained in multiple sequence alignments, the secondary structure of 4700 protein sequences was predicted by the automatic e-mail sewer PHD, For proteins with at least one known homologue, the method has an expected overall three-state accuracy of 71.4% for proteins with at least one known homologue (evaluated on 126 unique protein chains). [less ▲] Detailed reference viewed: 157 (2 UL)![]() ; ; et al in Journal of Molecular Biology (1993), 232(3), 805-825 Detailed reference viewed: 82 (0 UL)![]() ; Schneider, Reinhard ![]() in The ZEUS Consortium Massively Parallel Computing, Technical Report PC2 / TR-006-94 (1993) Detailed reference viewed: 93 (0 UL) |
||