Results 101-120 of 135.
Bookmark and Share    
Peer Reviewed
See detailBeyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery
Ofran, Y.; Punta, M.; Schneider, Reinhard UL et al

in Drug Discovery Today (2005), 10(21), 1475-1482

Every entirely sequenced genome reveals 100s to 1000s of protein sequences for which the only annotation available is 'hypothetical protein'. Thus, in the human genome and in the genomes of pathogenic ... [more ▼]

Every entirely sequenced genome reveals 100s to 1000s of protein sequences for which the only annotation available is 'hypothetical protein'. Thus, in the human genome and in the genomes of pathogenic agents there could be 1000s of potential, unexplored drug targets. Computational prediction of protein function can play a role in studying these targets. We shall review the challenges, research approaches and recently developed tools in the field of computational function-prediction and we will discuss the ways these issues can change the process of drug discovery. [less ▲]

Detailed reference viewed: 127 (0 UL)
See detailImproving Research Productivity at a Pharmaceutical Company
Ramakrishnan, S.; Caruso, A.; Schneider, Reinhard UL

in LION bioscience AG White Paper (2002)

Detailed reference viewed: 64 (0 UL)
See detailBioinformatik: Verloren im Datendschungel ?
Schneider, Reinhard UL

in Nachrichten aus der Chemie: Zeitschrift der Gesellschaft Deutscher Chemike (2000), 48(5), 622-625

Selbst Insider dürften von der stürmischen Entwicklung der Bioinformatik in den letzten Jahren überrascht worden sein. Die Bioinformatik hat sich dabei von einer „Elfenbeinturm-Wissenschaft“ zu einer ... [more ▼]

Selbst Insider dürften von der stürmischen Entwicklung der Bioinformatik in den letzten Jahren überrascht worden sein. Die Bioinformatik hat sich dabei von einer „Elfenbeinturm-Wissenschaft“ zu einer stark anwendungsorientierten Disziplin entwickelt. Die Hochdurchsatztechnologien und die damit verbundene Quantität an Daten auf der einen Seite und die starke Nachfrage von Seiten der Industrie an einer Auswertung der Daten auf der anderen Seite haben sich als die treibenden Kräfte für den Boom in der Bioinformatik erwiesen. So wird zur Zeit beispielsweise einer der weltweit größten „Supercomputer“, seit Jahrzehnten eher das Feld von Physikern oder Meterologen, bei der Firma Celera installiert, einer Firma die das menschliche Genom entschlüsselt, und „gelernte“ Bioinformatiker sind auf dem Arbeitsmarkt fast so schwer zu finden wie Trüffel im Wald. In diesem kurzen Artikel möchte ich mich auf die derzeitigen und zukünftigen Herausforderungen im Bereich der Life Science Informatik im Rahmen von F+E-Anstrengungen beschränken und den Kernbereich der vielleicht eher akademisch orientierten Bioinformatik, wie die Algorithmenentwicklung für die zahlreichen Vorhersagemethoden und Datenbanksuchen, ausblenden. [less ▲]

Detailed reference viewed: 50 (1 UL)
See detailDatenexplosion erschwert Bioforschung
Schneider, Reinhard UL

in Frankfurter Allgemeine Zeitung (2000)

Detailed reference viewed: 62 (0 UL)
Peer Reviewed
See detailINTERPRO
Apweiler, R.; Attwood, T. K.; Bairoch, A. et al

in Bioinformatics (2000)

InterPro is a new integrated documentation resource for protein families, domains and functional sites, developed as a means of rationalising the complementary efforts of the PROSITE, PRINTS, Pfam and ... [more ▼]

InterPro is a new integrated documentation resource for protein families, domains and functional sites, developed as a means of rationalising the complementary efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Merged annotations from PRINTS, PROSITE and Pfam form the InterPro core. Each combined InterPro entry includes functional descriptions and literature references, and links are made back to the relevant parent database(s), allowing users to see at a glance whether a particular family or domain has associated patterns, profiles, fingerprints, etc.. Merged and individual entries (i.e., those that have no counterpart in the companion resources) are assigned unique accession numbers. The first release of InterPro contains around 2,400 entries, representing families, domains, repeats and sites of post-translational modification (PTMs) encoded by 4,300 regular expressions, profiles, fingerprints and Hidden Markov Models (HMMs). Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 370,000 hits in total). The database is accessible for text-based searches at http://www.ebi.ac.uk/ interpro/. [less ▲]

Detailed reference viewed: 63 (0 UL)
See detailFunctional Genome Analysis
Schneider, Reinhard UL

in Proceedings zur Tagung Hoechstleistungsrechnen in der Chemie. Tagung fuer industrielle Anwender (1998)

Scientific history is made in sequencing complete genomes. Two challenges loom large: decipher the function of all genes and describe the workings of the eukaryotic cell in full molecular detail ! A ... [more ▼]

Scientific history is made in sequencing complete genomes. Two challenges loom large: decipher the function of all genes and describe the workings of the eukaryotic cell in full molecular detail ! A combination of experimental and theoretical approaches will be brought to bear on these challenges. What's next in genome analysis from the point of view of bioinformatics ? [less ▲]

Detailed reference viewed: 55 (0 UL)
Peer Reviewed
See detailThe HSSP database of protein structure sequence alignments and family profiles
Dodge, C.; Schneider, Reinhard UL; Sander, C.

in Nucleic Acids Research (1998), 26(1), 313-315

HSSP (http://www.sander.embl-ebi.ac.uk/hssp/) is a derived database merging structure (3-D) and sequence (1-D) information, For each protein of known 3D structure from the Protein Data Bank (PDB), we ... [more ▼]

HSSP (http://www.sander.embl-ebi.ac.uk/hssp/) is a derived database merging structure (3-D) and sequence (1-D) information, For each protein of known 3D structure from the Protein Data Bank (PDB), we provide a multiple sequence alignment of putative homologues and a sequence profile characteristic of the protein family, centered on the known structure. The list of homologues is the result of an iterative database search in SWISS-PROT using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently, The listed putative homologues are very likely to have the same 3D structure as the PDB protein to which they have been aligned. As a result, the database not only provides aligned sequence families, but also implies secondary and tertiary structures covering 33% of all sequences in SWISS-PROT. [less ▲]

Detailed reference viewed: 109 (0 UL)
Peer Reviewed
See detailThe HSSP database of protein structure-sequence alignments
Schneider, Reinhard UL; deDaruvar, A.; Sander, C.

in Nucleic Acids Research (1997), 25(1), 226-230

HSSP is a derived database merging structural (3-D) and sequence (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment ... [more ▼]

HSSP is a derived database merging structural (3-D) and sequence (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in SwissProt using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 29% of all SwissProt-stored sequences. [less ▲]

Detailed reference viewed: 110 (0 UL)
Full Text
Peer Reviewed
See detailProtein fold recognition by prediction-based threading
Rost, B.; Schneider, Reinhard UL; Sander, C.

in Journal of Molecular Biology (1997), 270(3), 471-480

In fold recognition by threading one takes the amino acid sequence of a protein and evaluates how well it fits into one of the known three-dimensional (3D) protein structures. The quality of sequence ... [more ▼]

In fold recognition by threading one takes the amino acid sequence of a protein and evaluates how well it fits into one of the known three-dimensional (3D) protein structures. The quality of sequence-structure fit is typically evaluated using inter-residue potentials of mean force or other statistical parameters. Here, we present an alternative approach to evaluating sequence-structure fitness. Starting from the amino acid sequence we first predict secondary structure and solvent accessibility for each residue. We then thread the resulting one-dimensional (1D) profile of predicted structure assignments into each of the known 3D structures. The optimal threading for each sequence-structure pair is obtained using dynamic programming. The overall best sequence-structure pair constitutes the predicted 3D structure for the input sequence. The method is fine-tuned by adding information from direct sequence-sequence comparison and applying a series of empirical filters. Although the method relies on reduction of 3D information into 1D structure profiles, its accuracy is, surprisingly, not clearly inferior to methods based on evaluation of residue interactions in 3D. We therefore hypothesise that existing 1D-3D threading methods essentially do not capture more than the fitness of an amino acid sequence for a particular 1D succession of secondary structure segments and residue solvent accessibility. The prediction-based threading method on average finds any structurally homologous region at first rank in 29% of the cases (including sequence information). For the 22% first hits detected at highest scores, the expected accuracy rose to 75%. However, the task of detecting entire folds rather than homologous fragments was managed much better; 45 to 75% of the first hits correctly recognised the fold. [less ▲]

Detailed reference viewed: 147 (0 UL)
See detailPedestrian guide to analyzing sequence databases
Rost, B.; Schneider, Reinhard UL; Sander, C.

in WWW-publication (1997)

Over the past few years our means of communication have changed rapidly due to the growth of the World Wide Web (WWW). The Web enables molecular biologists to immediately access databases, scan literature ... [more ▼]

Over the past few years our means of communication have changed rapidly due to the growth of the World Wide Web (WWW). The Web enables molecular biologists to immediately access databases, scan literature, find information about related research and researchers, and to trace cell cultures. Wet-lab biologists can uncover information about the protein of interest without having to become experts in sequence analysis. Here, we present a variety of tools; provide an overview of the state-of-the art in sequence analysis; and described some of the principles of the methods. [less ▲]

Detailed reference viewed: 53 (2 UL)
Full Text
Peer Reviewed
See detailCharacterization of new proteins found by analysis of short open reading frames from the full yeast genome
Andrade, M. A.; Daruvar, A.; Casari, G. et al

in Journal of Yeast and Fungal Research (1997), 13(14), 1363-1374

We have analysed short open reading frames (between 150 and 300 base pairs long) of the yeast genome (Saccharomyces cerevisiae) with a two-step strategy. The first step selects a candidate set of open ... [more ▼]

We have analysed short open reading frames (between 150 and 300 base pairs long) of the yeast genome (Saccharomyces cerevisiae) with a two-step strategy. The first step selects a candidate set of open reading frames from the DNA. sequence based on statistical evaluation of DNA and protein sequence properties. The second step filters the candidate set by selecting open reading frames with high similarity to other known sequences (from any organism). As a result, we report ten new predicted proteins not present in the current sequence databases. These include a new alcohol dehydrogenase, a protein probably related to the cell cycle, as well as a homolog of the prokaryotic ribosomal protein L36 likely to be a mitochondrial ribosomal protein coded in the nuclear genome. We conclude that the analysis of short open reading frames leads to biologically interesting discoveries, even though the quantitative yield of new proteins is relatively low. [less ▲]

Detailed reference viewed: 105 (0 UL)
Full Text
Peer Reviewed
See detailSequence analysis of the Methanococcus jannaschii genome and the prediction of protein function
Andrade, M.; Casari, G.; deDaruvar, A. et al

in Computer Applications in the Biosciences [=CABIOS] (1997), 13(4), 481-483

Detailed reference viewed: 105 (1 UL)
Full Text
See detailGeneCrunch and Europort, examples for Hierarchical Supercomputing at Silicon Graphics
Schneider, Reinhard UL; Schlenkrich, M.

in WWW-publication (1996)

The SGI POWER CHALLENGEarray TM represents a hierarchical supercomputer because it combines distributed and shared memory technology. We present two projects, Europort and GeneCrunch, that took advantage ... [more ▼]

The SGI POWER CHALLENGEarray TM represents a hierarchical supercomputer because it combines distributed and shared memory technology. We present two projects, Europort and GeneCrunch, that took advantage of such a configuration. In Europort we performed scalability demonstrations up to 64 processors with applications relevant to the chemical and pharmaceutical industries. GeneCrunch, a project in bioinformatics, performed an analysis of the whole yeast genome using the software system GeneQuiz. This project showcased the future demands of HPC in pharmaceutical industries in tackling analysis of fast growing volumes of sequence information. GeneQuiz, an automated software system for large-scale genome analysis developed at the EMBL /EBI , aims at predicting the function of new genes by using an automated, rigorous, rule-based system to process the results of sequence analysis and database searches to build databases of annotations and predictions. In GeneCrunch more than 6,000 proteins from baker's yeast, for which the complete genomic sequence was completed in 1996, were analyzed on a SGI® POWER CHALLENGEarray with 64 processors (R8000® at 90MHz) in three days rather than the seven months predicted for a normal workstation [less ▲]

Detailed reference viewed: 76 (0 UL)
Peer Reviewed
See detailThe HSSP database of protein structure sequence alignments
Schneider, Reinhard UL; Sander, C.

in Nucleic Acids Research (1996), 24(1), 201-205

HSSP is a derived database merging structural three dimensional (3-D) and sequence one dimensional (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database ... [more ▼]

HSSP is a derived database merging structural three dimensional (3-D) and sequence one dimensional (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in Swissprot using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 27% of all Swissprotstored sequences. [less ▲]

Detailed reference viewed: 103 (0 UL)
Full Text
Peer Reviewed
See detailBioinformatics and the discovery of gene function
Casari, G.; Daruvar, Dea; Sander, C. et al

in Trends in Genetics (1996), 128(7), 244-245

Scientific history was made in completing the yeast genuine sequence, yet its 13 Mb are a mere starting point. Two challenges loom large: to decipher the function of all genes and to describe the workings ... [more ▼]

Scientific history was made in completing the yeast genuine sequence, yet its 13 Mb are a mere starting point. Two challenges loom large: to decipher the function of all genes and to describe the workings of the eukaryotic cell in full molecular detail. A combination of experimental and theoretical approaches will be brought to bear on these challenges. What will be next in yeast genome analysis from the point of view of bioinformatics? [less ▲]

Detailed reference viewed: 129 (0 UL)
Peer Reviewed
See detailGeneCrunch: Experiences on the SGI POWER CHALLENGEarray with Bioinformatics applications
Schneider, Reinhard UL; Casari, G.; Daruvar, A. et al

in Supercomputer 96 : Anwendungen, Architekturen, Trends (1996)

Analyzing genomic data is a computationally intensive and complicated process in which scientists must typically choose among multiple databases and analysis methods and make expert judgements inspecting ... [more ▼]

Analyzing genomic data is a computationally intensive and complicated process in which scientists must typically choose among multiple databases and analysis methods and make expert judgements inspecting multiple results. GeneQuiz, an automated software system for large scale genome analysis developed at the EMBL/EBI, tackles this problem by using an automated, rigorous, rule-based system to select among the results of sequence analysis and database searches, builds informative annotation and aims at predicting the function of new genes. In a demonstration project more than 6000 proteins from the Baker’s yeast, for which the complete genomic sequence was completed in 1996, were analyzed on a Silicon Graphics POWERCHALLENGEarray with 64 processors (R8000 @90 MHz) so that the analysis could be completed in 3 days. The results of the analysis were published on two web servers as they were computed. [less ▲]

Detailed reference viewed: 95 (0 UL)
Peer Reviewed
See detailFast and sensitive search of information databases for biological relationships
Schneider, Reinhard UL; Casari, G.; Sander, C.

in Statustagung des BMBF, HPSC 95, Stand und Perspektiven des Parallelen Höchstleistungsrechnens und seiner Anwendungen (1995)

Sequence comparison has become an essential and standard tool in the analysis of genomic data. Genome projects will decipher much of the genetic information in many organisms, including humans. As a ... [more ▼]

Sequence comparison has become an essential and standard tool in the analysis of genomic data. Genome projects will decipher much of the genetic information in many organisms, including humans. As a result, the computational cost of databank searches will increase dramatically. In addition, the search for biologically meaningful homology between a newly determined sequence and sequences already stored in the various databanks becomes increasingly important as most of the new data will be in raw, not understood form. The detection of sufficient similarity between a newly determined sequence to a protein of know function or even known 3D-structure in a databank allows one to transfer most of the knowledge from one sequence to the other. The result can be enormous savings in genetic and biochemical laboratory efforts. [less ▲]

Detailed reference viewed: 59 (0 UL)
Peer Reviewed
See detailEXPLORING THE MYCOPLASMA-CAPRICOLUM GENOME - A MINIMAL CELL REVEALS ITS PHYSIOLOGY
BORK, P.; OUZOUNIS, C.; CASARI, G. et al

in Molecular Microbiology (1995), 16(5), 955-967

We report on the analysis of 214 kb of the parasitic eubacterium Mycoplasma capricolum sequenced by genomic walking techniques. The 287 putative proteins detected to date represent about half of the ... [more ▼]

We report on the analysis of 214 kb of the parasitic eubacterium Mycoplasma capricolum sequenced by genomic walking techniques. The 287 putative proteins detected to date represent about half of the estimated total number of 500 predicted for this organism. A large fraction of these (75%) can be assigned a likely function as a result of similarity searches. Several important features of the functional organization of this small genome are already apparent. Among these are (i) the expected relatively large number of enzymes involved in metabolic transport and activation, for efficient use of host cell nutrients; (ii) the presence of anabolic enzymes; (iii) the unexpected diversity of enzymes involved in DNA replication and repair; and (iv) a sizeable number of orthologues (82 so far) in Escherichia coil. This survey is beginning to provide a detailed view of how M. capricolum manages to maintain essential cellular processes with a genome much smaller than that of its bacterial relatives. [less ▲]

Detailed reference viewed: 124 (2 UL)
Full Text
Peer Reviewed
See detailCHALLENGING TIMES FOR BIOINFORMATICS
CASARI, G.; ANDRADE, M. A.; BORK, P. et al

in Nature (1995), 376(6542), 647-648

Detailed reference viewed: 141 (6 UL)
See detailSequenz und Sequenz-Struktur Vergleiche und deren Anwendung für die Struktur- und Funktionsvorhersage von Proteinen
Schneider, Reinhard UL

Doctoral thesis (1994)

Zusammenfassung der Inaugural-Dissertation Name: Reinhard Schneider Titel: Sequenz und Sequenz-Struktur Vergleiche und deren Anwendung für die Struktur- und Funktionsvorhersage von Proteinen Betreuer ... [more ▼]

Zusammenfassung der Inaugural-Dissertation Name: Reinhard Schneider Titel: Sequenz und Sequenz-Struktur Vergleiche und deren Anwendung für die Struktur- und Funktionsvorhersage von Proteinen Betreuer: Prof. Dr. K. C. Holmes (MPI für medizinische Forschung, Heidelberg) Durch die sogenannten Genomprojekte wird es in den nächsten Jahren zu einer enormen Vergrößerung der biologischen Sequenzdatenbanken kommen. Eine unabdingbare Voraussetzung zur Nutzung dieses Rohmaterials stellt dabei die Analyse dieser Sequenzdaten mit Hilfe rechnergestützter Methoden dar. Eines der Hauptanwendungsgebiete von Rechnern für die Funktions- und Strukturvorhersage von Proteinen werden dabei selektive Datenbanksuche nach biologisch signifikanten Ähnlichkeiten sein. Zur Signifikanzabschätzung eines Proteinsequenzvergleiches (Alignment) wurde ein empirisch abgeleiteter Homologieschwellenwert definiert. Wichtigstes Merkmal ist dabei eine starke Abhängigkeit von der Länge des betreffenden Alignments. Diese Signifikanzabschätzung ermöglicht sowohl den Ausschluß von nicht verwandten Proteinen, wie auch die Detektion von schwachen Sequenzverwandtschaften. Aufgrund der Allgemeingültigkeit des Homologieschwellenwertes kann er als einfacher und effizienter zusätzlicher Filter für andere Methoden, wie z.B. schnelle Datenbanksuchen, verwendet werden. Es wurde ein neuer Algorithmus für den multiplen Sequenzvergleich entwickelt, der eine relativ geringe rechnerische Komplexität besitzt. Das Hauptmerkmal dieses Algorithmus besteht in der Ableitung von sogenannten positionsabhängigen Konservierungsgewichten, die als zusätzliche Parameter im dynamischen Programmieralgorithmus verwendet werden und zu einer deutlich gesteigerten Sensitivität bei Datenbanksuchen führt. Die programmiertechnische Auslegung des Algorithmus erlaubt die zukünftige Erweiterung auf den Vergleich einer Sequenz gegen ein Sequenzprofil bzw. den Vergleich zweier Sequenzprofile. Um auch zukünftig sensitive Datenbanksuchen in einer vertretbaren Rechenzeit durchführen zu können,, wurde das Programm auf parallele Rechner portiert. Die Ergebnisse zeigen, daß mit den heute verfügbaren massiv parallelen Rechnern ein beinahe interaktives Arbeiten möglich ist. Aufbauend auf dieser Arbeit wird derzeit im Rahmen eines europäischen Projektes die Implementierung der Profilmethoden auf Parallelrechner der neuesten Generation durchgeführt und der Nutzen für das industrielle “Protein design” bestimmt. Mit Hilfe des Homologieschwellenwertes konnte eine Datenbank für homologie-abgeleitete Proteinstrukturen (HSSP) entwickelt werden. Diese Datenbank wird der Öffentlichkeit auf verschiedenen Wegen zugänglich gemacht und hat sich als ein gewisser Standard etabliert. Die Datenbank findet dabei Verwendung im automatisierten dreidimensionalen Modellbau von Proteinstrukturen, sowie als Hilfsmittel und Datengrundlage für ein weitgestecktes Feld von statistischen und anderen theoretischen Arbeiten. Die Verwendung der Datenbank hat einen entscheidenden Beitrag bei der Entwicklung des derzeit besten Programms zur Vorhersage der Sekundärstruktur von Proteinen geleistet. Diese Vorhersagemethode basiert auf einem neuronalen Netzwerk, das die Informationen eines multiplen Sequenzvergleichs ausnutzt. Zur Berechnung der multiplen Sequenzalignments und der dazu notwendigen Datenbanksuche wird das in dieser Arbeit entwickelte Programm verwendet. Die Methode wurde in Form eines Vorhersagedienstes, der über internationale Datenleitungen verfügbar ist, der Öffentlichkeit zugänglich gemacht. Einen neuen Ansatz für die Vorhersage von Proteinstruktur bei fehlender Sequenzverwandtschaft zu einer bereits bekannten Struktur stellt die Methode für das Sequenz-Strukturalignment (“threading”) dar. Dazu wird eine dreidimensionale Struktur in Form von interatomaren Kontakten beschrieben und mit Hilfe von Präferenzparametern die Tauglichkeit einer Sequenz in eine Struktur bewertet. Die Ergebnisse zeigen, daß sowohl eine Verbesserung der abstrahierten Beschreibung für eine dreidimensionale Proteinstruktur, wie auch ein verbesserter Alignmentalgorithmus notwendig ist. Ein in der Praxis vielversprechender Ansatz ist die Verwendung von Methoden, die einerseits eine abstrahierte 3D-Beschreibung zulassen und zusätzlich einen gewissen Grad an Sequenzinformation, etwa in Form eines Sequenzprofils mit einbeziehen. Am Beispiel eines kompletten Chromosoms aus Hefe wurde eine komplexe funktionelle Genomanalyse durchgeführt. Hierbei konnten eine Reihe von biologisch interessanten Sequenzverwandtschaften aufgedeckt werden, waren jedoch mit einem hohen Arbeitsaufwand verbunden. Dabei stellte sich die ungenügende Integration der vorhanden Methoden und heterogenen Datenbanken als Hauptproblem heraus. Die dabei gesammelten Erfahrungen fließen derzeit in die Entwicklung eines integrierten Softwarepackets ein, mit dessen Hilfe es möglich sein wird, den Arbeitsaufwand, der zur Analyse von großen Datenmengen notwendig ist, drastisch zu reduzieren. [less ▲]

Detailed reference viewed: 162 (2 UL)