References of "BMC bioinformatics"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailClustering of cognate proteins among distinct proteomes derived from multiple links to a single seed sequence.
Barbosa Da Silva, Adriano UL; Satagopam, Venkata UL; Schneider, Reinhard UL et al

in BMC bioinformatics (2008), 9

BACKGROUND: Modern proteomes evolved by modification of pre-existing ones. It is extremely important to comparative biology that related proteins be identified as members of the same cognate group, since ... [more ▼]

BACKGROUND: Modern proteomes evolved by modification of pre-existing ones. It is extremely important to comparative biology that related proteins be identified as members of the same cognate group, since a characterized putative homolog could be used to find clues about the function of uncharacterized proteins from the same group. Typically, databases of related proteins focus on those from completely-sequenced genomes. Unfortunately, relatively few organisms have had their genomes fully sequenced; accordingly, many proteins are ignored by the currently available databases of cognate proteins, despite the high amount of important genes that are functionally described only for these incomplete proteomes. RESULTS: We have developed a method to cluster cognate proteins from multiple organisms beginning with only one sequence, through connectivity saturation with that Seed sequence. We show that the generated clusters are in agreement with some other approaches based on full genome comparison. CONCLUSION: The method produced results that are as reliable as those produced by conventional clustering approaches. Generating clusters based only on individual proteins of interest is less time consuming than generating clusters for whole proteomes. [less ▲]

Detailed reference viewed: 197 (6 UL)
Full Text
Peer Reviewed
See detailPrediction of TF target sites based on atomistic models of protein-DNA complexes.
Espinosa Angarica, Vladimir UL; Perez, Abel Gonzalez; Vasconcelos, Ana T. et al

in BMC bioinformatics (2008), 9

BACKGROUND: The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms ... [more ▼]

BACKGROUND: The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. RESULTS: Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. CONCLUSION: Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. [less ▲]

Detailed reference viewed: 95 (0 UL)
Peer Reviewed
See detailEstimation of the number of extreme pathways for metabolic networks.
Yeung, Matthew; Thiele, Ines UL; Palsson, Bernard O.

in BMC Bioinformatics (2007), 8(1), 363

ABSTRACT: BACKGROUND: The set of extreme pathways (ExPa), {pi}, defines the convex basis vectors used for the mathematical characterization of the null space of the stoichiometric matrix for biochemical ... [more ▼]

ABSTRACT: BACKGROUND: The set of extreme pathways (ExPa), {pi}, defines the convex basis vectors used for the mathematical characterization of the null space of the stoichiometric matrix for biochemical reaction networks. ExPa analysis has been used for a number of studies to determine properties of metabolic networks as well as to obtain insight into their physiological and functional states in silico. However, the number of ExPas, p = |{pi}|, grows with the size and complexity of the network being studied, and this poses a computational challenge. For this study, we investigated the relationship between the number of extreme pathways and simple network properties. RESULTS: We established an estimating function for the number of ExPas using these easily obtainable network measurements. In particular, it was found that log [p] had an exponential relationship with log[ summation operatori=1Rd-id+ici] MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBa ebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8ku c9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacyG GSbaBcqGGVbWBcqGGNbWzdaWadaqaamaaqadabaGaemizaq2aaSbaaSqaaiabgkHiTmaaBaaameaacqWG PbqAaeqaaaWcbeaakiabdsgaKnaaBaaaleaacqGHRaWkdaWgaaadbaGaemyAaKgabeaaaSqabaGccqWGJ bWydaWgaaWcbaGaemyAaKgabeaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGsbGua0GaeyyeIuoaaO Gaay5waiaaw2faaaaa@4414@, where R = |Reff| is the number of active reactions in a network, d-i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBa ebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8ku c9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqW GKbazdaWgaaWcbaGaeyOeI0YaaSbaaWqaaiabdMgaPbqabaaaleqaaaaa@30A9@ and d+i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBa ebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8ku c9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqW GKbazdaWgaaWcbaGaey4kaSYaaSbaaWqaaiabdMgaPbqabaaaleqaaaaa@309E@ the incoming and outgoing degrees of the reactions ri in Reff, and ci the clustering coefficient for each active reaction. CONCLUSION: This relationship typically gave an estimate of the number of extreme pathways to within a factor of 10 of the true number. Such a function providing an estimate for the total number of ExPas for a given system will enable researchers to decide whether ExPas analysis is an appropriate investigative tool. [less ▲]

Detailed reference viewed: 100 (1 UL)
Full Text
Peer Reviewed
See detailIdentifying protein complexes directly from high-throughput TAP data with Markov random fields.
Rungsarityotin, Wasinee; Krause, Roland UL; Schodl, Arno et al

in BMC Bioinformatics (2007), 8

BACKGROUND: Predicting protein complexes from experimental data remains a challenge due to limited resolution and stochastic errors of high-throughput methods. Current algorithms to reconstruct the ... [more ▼]

BACKGROUND: Predicting protein complexes from experimental data remains a challenge due to limited resolution and stochastic errors of high-throughput methods. Current algorithms to reconstruct the complexes typically rely on a two-step process. First, they construct an interaction graph from the data, predominantly using heuristics, and subsequently cluster its vertices to identify protein complexes. RESULTS: We propose a model-based identification of protein complexes directly from the experimental observations. Our model of protein complexes based on Markov random fields explicitly incorporates false negative and false positive errors and exhibits a high robustness to noise. A model-based quality score for the resulting clusters allows us to identify reliable predictions in the complete data set. Comparisons with prior work on reference data sets shows favorable results, particularly for larger unfiltered data sets. Additional information on predictions, including the source code under the GNU Public License can be found at http://algorithmics.molgen.mpg.de/Static/Supplements/ProteinComplexes. CONCLUSION: We can identify complexes in the data obtained from high-throughput experiments without prior elimination of proteins or weak interactions. The few parameters of our model, which does not rely on heuristics, can be estimated using maximum likelihood without a reference data set. This is particularly important for protein complex studies in organisms that do not have an established reference frame of known protein complexes. [less ▲]

Detailed reference viewed: 108 (2 UL)
Full Text
Peer Reviewed
See detailIn search of functional association from time-series microarray data based on the change trend and level of gene expression.
He, Feng UL; Zeng, An-Ping

in BMC Bioinformatics (2006), 7

BACKGROUND: The increasing availability of time-series expression data opens up new possibilities to study functional linkages of genes. Present methods used to infer functional linkages between genes ... [more ▼]

BACKGROUND: The increasing availability of time-series expression data opens up new possibilities to study functional linkages of genes. Present methods used to infer functional linkages between genes from expression data are mainly based on a point-to-point comparison. Change trends between consecutive time points in time-series data have been so far not well explored. RESULTS: In this work we present a new method based on extracting main features of the change trend and level of gene expression between consecutive time points. The method, termed as trend correlation (TC), includes two major steps: 1, calculating a maximal local alignment of change trend score by dynamic programming and a change trend correlation coefficient between the maximal matched change levels of each gene pair; 2, inferring relationships of gene pairs based on two statistical extraction procedures. The new method considers time shifts and inverted relationships in a similar way as the local clustering (LC) method but the latter is merely based on a point-to-point comparison. The TC method is demonstrated with data from yeast cell cycle and compared with the LC method and the widely used Pearson correlation coefficient (PCC) based clustering method. The biological significance of the gene pairs is examined with several large-scale yeast databases. Although the TC method predicts an overall lower number of gene pairs than the other two methods at a same p-value threshold, the additional number of gene pairs inferred by the TC method is considerable: e.g. 20.5% compared with the LC method and 49.6% with the PCC method for a p-value threshold of 2.7E-3. Moreover, the percentage of the inferred gene pairs consistent with databases by our method is generally higher than the LC method and similar to the PCC method. A significant number of the gene pairs only inferred by the TC method are process-identity or function-similarity pairs or have well-documented biological interactions, including 443 known protein interactions and some known cell cycle related regulatory interactions. It should be emphasized that the overlapping of gene pairs detected by the three methods is normally not very high, indicating a necessity of combining the different methods in search of functional association of genes from time-series data. For a p-value threshold of 1E-5 the percentage of process-identity and function-similarity gene pairs among the shared part of the three methods reaches 60.2% and 55.6% respectively, building a good basis for further experimental and functional study. Furthermore, the combined use of methods is important to infer more complete regulatory circuits and network as exemplified in this study. CONCLUSION: The TC method can significantly augment the current major methods to infer functional linkages and biological network and is well suitable for exploring temporal relationships of gene expression in time-series data. [less ▲]

Detailed reference viewed: 53 (5 UL)
Peer Reviewed
See detailA domain-oriented approach to the reduction of combinatorial complexity in signal transduction networks.
Conzelmann, Holger; Saez-Rodriguez, Julio; Sauter, Thomas UL et al

in BMC Bioinformatics (2006), 7

BACKGROUND: Receptors and scaffold proteins possess a number of distinct domains and bind multiple partners. A common problem in modeling signaling systems arises from a combinatorial explosion of ... [more ▼]

BACKGROUND: Receptors and scaffold proteins possess a number of distinct domains and bind multiple partners. A common problem in modeling signaling systems arises from a combinatorial explosion of different states generated by feasible molecular species. The number of possible species grows exponentially with the number of different docking sites and can easily reach several millions. Models accounting for this combinatorial variety become impractical for many applications. RESULTS: Our results show that under realistic assumptions on domain interactions, the dynamics of signaling pathways can be exactly described by reduced, hierarchically structured models. The method presented here provides a rigorous way to model a large class of signaling networks using macro-states (macroscopic quantities such as the levels of occupancy of the binding domains) instead of micro-states (concentrations of individual species). The method is described using generic multidomain proteins and is applied to the molecule LAT. CONCLUSION: The presented method is a systematic and powerful tool to derive reduced model structures describing the dynamics of multiprotein complex formation accurately. [less ▲]

Detailed reference viewed: 49 (0 UL)
Full Text
Peer Reviewed
See detailApplying dynamic Bayesian networks to perturbed gene expression data
Norbert, Dojer; Gambin, Anna; Mizera, Andrzej UL et al

in BMC Bioinformatics (2006), 7

Detailed reference viewed: 69 (6 UL)
Peer Reviewed
See detailConnectivity independent protein-structure alignment: a hierarchical approach.
Kolbeck, Bjoern; May, Patrick UL; Schmidt-Goenner, Tobias et al

in BMC Bioinformatics (2006), 7

BACKGROUND: Protein-structure alignment is a fundamental tool to study protein function, evolution and model building. In the last decade several methods for structure alignment were introduced, but most ... [more ▼]

BACKGROUND: Protein-structure alignment is a fundamental tool to study protein function, evolution and model building. In the last decade several methods for structure alignment were introduced, but most of them ignore that structurally similar proteins can share the same spatial arrangement of secondary structure elements (SSE) but differ in the underlying polypeptide chain connectivity (non-sequential SSE connectivity). RESULTS: We perform protein-structure alignment using a two-level hierarchical approach implemented in the program GANGSTA. On the first level, pair contacts and relative orientations between SSEs (i.e. alpha-helices and beta-strands) are maximized with a genetic algorithm (GA). On the second level residue pair contacts from the best SSE alignments are optimized. We have tested the method on visually optimized structure alignments of protein pairs (pairwise mode) and for database scans. For a given protein structure, our method is able to detect significant structural similarity of functionally important folds with non-sequential SSE connectivity. The performance for structure alignments with strictly sequential SSE connectivity is comparable to that of other structure alignment methods. CONCLUSION: As demonstrated for several applications, GANGSTA finds meaningful protein-structure alignments independent of the SSE connectivity. GANGSTA is able to detect structural similarity of protein folds that are assigned to different superfamilies but nevertheless possess similar structures and perform related functions, even if these proteins differ in SSE connectivity. [less ▲]

Detailed reference viewed: 57 (2 UL)