References of "Galas, David J. 40000219"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailA novel Fanconi anemia subtype associated with a dominant-negative mutation in RAD51
Ameziane, Najim; May, Patrick UL; Van de Vrugt, Henri J. et al

in Nature Communications (2015), 6(8829),

Fanconi anemia (FA) is a hereditary disease featuring hypersensitivity to DNA cross-linker-induced chromosomal instability in association with developmental abnormalities, bone marrow failure and a strong ... [more ▼]

Fanconi anemia (FA) is a hereditary disease featuring hypersensitivity to DNA cross-linker-induced chromosomal instability in association with developmental abnormalities, bone marrow failure and a strong predisposition to cancer. 17 FA disease genes have been reported, all of which act in a recessive mode of inheritance. Here we report on a de novo g.41022153G>A; p.Ala293Thr (NM_002875) missense mutation in one allele of the homologous recombination DNA repair gene RAD51 in an FA-like patient. This heterozygous mutation causes a novel FA subtype, “FA-R”, which appears to be the first subtype of FA caused by a dominant-negative mutation. The patient, who features microcephaly and mental retardation, has reached adulthood without the typical bone marrow failure and pediatric cancers. Together with the recent reports on RAD51-associated congenital mirror movement disorders our results point to an important role for RAD51-mediated homologous recombination in neurodevelopment, in addition to DNA repair and cancer susceptibility. [less ▲]

Detailed reference viewed: 220 (21 UL)
Full Text
Peer Reviewed
See detailThe extracellular RNA complement of Escherichia coli
Ghosal, Anubrata UL; Upadhyaya, Bimal Babu UL; Fritz, Joëlle UL et al

in MicrobiologyOpen (2015)

he secretion of biomolecules into the extracellular milieu is a common and well-conserved phenomenon in biology. In bacteria, secreted biomolecules are not only involved in intra-species communication but ... [more ▼]

he secretion of biomolecules into the extracellular milieu is a common and well-conserved phenomenon in biology. In bacteria, secreted biomolecules are not only involved in intra-species communication but they also play roles in inter-kingdom exchanges and pathogenicity. To date, released products, such as small molecules, DNA, peptides, and proteins, have been well studied in bacte- ria. However, the bacterial extracellular RNA complement has so far not been comprehensively characterized. Here, we have analyzed, using a combination of physical characterization and high-throughput sequencing, the extracellular RNA complement of both outer membrane vesicle (OMV)-associated and OMV-free RNA of the enteric Gram-negative model bacterium Escherichia coli K-12 substrain MG1655 and have compared it to its intracellular RNA comple- ment. Our results demonstrate that a large part of the extracellular RNA com- plement is in the size range between 15 and 40 nucleotides and is derived from specific intracellular RNAs. Furthermore, RNA is associated with OMVs and the relative abundances of RNA biotypes in the intracellular, OMV and OMV- free fractions are distinct. Apart from rRNA fragments, a significant portion of the extracellular RNA complement is composed of specific cleavage products of functionally important structural noncoding RNAs, including tRNAs, 4.5S RNA, 6S RNA, and tmRNA. In addition, the extracellular RNA pool includes RNA biotypes from cryptic prophages, intergenic, and coding regions, of which some are so far uncharacterised, for example, transcripts mapping to the fimA- fimL and ves-spy intergenic regions. Our study provides the first detailed char- acterization of the extracellular RNA complement of the enteric model bacte- rium E. coli. Analogous to findings in eukaryotes, our results suggest the selective export of specific RNA biotypes by E. coli, which in turn indicates a potential role for extracellular bacterial RNAs in intercellular communication. [less ▲]

Detailed reference viewed: 266 (15 UL)
Full Text
Peer Reviewed
See detailBiological data analysis as an information theory problem: multivariable dependence measures and the shadows algorithm
Sakhanenko, Nikita A.; Galas, David J. UL

in Journal of Computational Biology (2015), 22(11), 1005-1024

Information theory is valuable in multiple-variable analysis for being model-free and nonparametric, and for the modest sensitivity to undersampling. We previously introduced a general approach to finding ... [more ▼]

Information theory is valuable in multiple-variable analysis for being model-free and nonparametric, and for the modest sensitivity to undersampling. We previously introduced a general approach to finding multiple dependencies that provides accurate measures of levels of dependency for subsets of variables in a data set, which is significantly nonzero only if the subset of variables is collectively dependent. This is useful, however, only if we can avoid a combinatorial explosion of calculations for increasing numbers of variables. The proposed dependence measure for a subset of variables, tau, differential interaction information, Delta(tau), has the property that for subsets of tau some of the factors of Delta(tau) are significantly nonzero, when the full dependence includes more variables. We use this property to suppress the combinatorial explosion by following the "shadows" of multivariable dependency on smaller subsets. Rather than calculating the marginal entropies of all subsets at each degree level, we need to consider only calculations for subsets of variables with appropriate "shadows." The number of calculations for n variables at a degree level of d grows therefore, at a much smaller rate than the binomial coefficient (n, d), but depends on the parameters of the "shadows" calculation. This approach, avoiding a combinatorial explosion, enables the use of our multivariable measures on very large data sets. We demonstrate this method on simulated data sets, and characterize the effects of noise and sample numbers. In addition, we analyze a data set of a few thousand mutant yeast strains interacting with a few thousand chemical compounds. [less ▲]

Detailed reference viewed: 117 (7 UL)
Full Text
See detailMeeting report: discussions and preliminary findings on extracellular RNA measurement methods from laboratories in the NIH Extracellular RNA Communication Consortium
Laurent, Louise; Abdel-Mageed, Asim; Adelson, P. David et al

in Journal of Extracellular Vesicles (2015), 4

Extracellular RNAs (exRNAs) have been identified in all tested biofluids and have been associated with a variety of extracellular vesicles, ribonucleoprotein complexes and lipoprotein complexes. Much of ... [more ▼]

Extracellular RNAs (exRNAs) have been identified in all tested biofluids and have been associated with a variety of extracellular vesicles, ribonucleoprotein complexes and lipoprotein complexes. Much of the interest in exRNAs lies in the fact that they may serve as signalling molecules between cells, their potential to serve as biomarkers for prediction and diagnosis of disease and the possibility that exRNAs or the extracellular particles that carry them might be used for therapeutic purposes. Among the most significant bottlenecks to progress in this field is the lack of robust and standardized methods for collection and processing of biofluids, separation of different types of exRNA-containing particles and isolation and analysis of exRNAs. The Sample and Assay Standards Working Group of the Extracellular RNA Communication Consortium is a group of laboratories funded by the U.S. National Institutes of Health to develop such methods. In our first joint endeavour, we held a series of conference calls and in-person meetings to survey the methods used among our members, placed them in the context of the current literature and used our findings to identify areas in which the identification of robust methodologies would promote rapid advancements in the exRNA field. [less ▲]

Detailed reference viewed: 128 (0 UL)
Full Text
Peer Reviewed
See detailSystems genomics evaluation of the SH-SY5Y neuroblastoma cell line as a model for Parkinson’s disease
Krishna, Abhimanyu UL; Biryukov, Maria UL; Trefois, Christophe UL et al

in BMC Genomics (2014), 15(1154),

Background: The human neuroblastoma cell line, SH-SY5Y, is a commonly used cell line in studies related to neurotoxicity, oxidative stress, and neurodegenerative diseases. Although this cell line is often ... [more ▼]

Background: The human neuroblastoma cell line, SH-SY5Y, is a commonly used cell line in studies related to neurotoxicity, oxidative stress, and neurodegenerative diseases. Although this cell line is often used as a cellular model for Parkinson’s disease, the relevance of this cellular model in the context of Parkinson’s disease (PD) and other neurodegenerative diseases has not yet been systematically evaluated. Results: We have used a systems genomics approach to characterize the SH-SY5Y cell line using whole-genome sequencing to determine the genetic content of the cell line and used transcriptomics and proteomics data to determine molecular correlations. Further, we integrated genomic variants using a network analysis approach to evaluate the suitability of the SH-SY5Y cell line for perturbation experiments in the context of neurodegenerative diseases, including PD. Conclusions: The systems genomics approach showed consistency across different biological levels (DNA, RNA and protein concentrations). Most of the genes belonging to the major Parkinson’s disease pathways and modules were intact in the SH-SY5Y genome. Specifically, each analysed gene related to PD has at least one intact copy in SH-SY5Y. The disease-specific network analysis approach ranked the genetic integrity of SH-SY5Y as higher for PD than for Alzheimer’s disease but lower than for Huntington’s disease and Amyotrophic Lateral Sclerosis for loss of function perturbation experiments. [less ▲]

Detailed reference viewed: 303 (25 UL)
Full Text
Peer Reviewed
See detailMutations in STX1B, encoding a presynaptic protein, cause fever-associated epilepsy syndromes
Schubert, Julian; Siekierska, Aleksandra; Langlois, Melanie UL et al

in Nature Genetics (2014), 46(12), 1327-32

Febrile seizures affect 2–4% of all children1 and have a strong genetic component2. Recurrent mutations in three main genes (SCN1A, SCN1B and GABRG2)3, 4, 5 have been identified that cause febrile ... [more ▼]

Febrile seizures affect 2–4% of all children1 and have a strong genetic component2. Recurrent mutations in three main genes (SCN1A, SCN1B and GABRG2)3, 4, 5 have been identified that cause febrile seizures with or without epilepsy. Here we report the identification of mutations in STX1B, encoding syntaxin-1B6, that are associated with both febrile seizures and epilepsy. Whole-exome sequencing in independent large pedigrees7, 8 identified cosegregating STX1B mutations predicted to cause an early truncation or an in-frame insertion or deletion. Three additional nonsense or missense mutations and a de novo microdeletion encompassing STX1B were then identified in 449 familial or sporadic cases. Video and local field potential analyses of zebrafish larvae with antisense knockdown of stx1b showed seizure-like behavior and epileptiform discharges that were highly sensitive to increased temperature. Wild-type human syntaxin-1B but not a mutated protein rescued the effects of stx1b knockdown in zebrafish. Our results thus implicate STX1B and the presynaptic release machinery in fever-associated epilepsy syndromes. [less ▲]

Detailed reference viewed: 439 (106 UL)
Full Text
Peer Reviewed
See detailMolecular and Clinical Evidence for an ARMC5 Tumor Syndrome: Concurrent Inactivating Germline and Somatic Mutations are Associated with both Primary Macronodular Adrenal Hyperplasia and Meningioma
Eibelt, Ulf; Trovato, Alissa; Kloth, Michael et al

in Journal of Clinical Endocrinology and Metabolism (2014)

Context:Primary macronodular adrenal hyperplasia (PMAH) is a rare cause of Cushing's syndrome (CS), which may present in the context of different familial multitumor syndromes. Heterozygous inactivating ... [more ▼]

Context:Primary macronodular adrenal hyperplasia (PMAH) is a rare cause of Cushing's syndrome (CS), which may present in the context of different familial multitumor syndromes. Heterozygous inactivating germline mutations of armadillo repeat containing 5 (ARMC5) have very recently been described as cause for sporadic PMAH. Whether this genetic condition also causes familial PMAH in association with other neoplasias is unclear. Objective: The aim of the present study was to delineate the molecular cause in a large family with PMAH and other neoplasias. Patients and Methods: Whole genome sequencing and comprehensive clinical and biochemical phenotyping was performed in members of a PMAH affected family. Nodules derived from adrenal surgery and pancreatic and meningeal tumor tissue were analysed for accompanying somatic mutations in the identified target genes. Results: PMAH presenting either as overt or subclinical CS was accompanied by a heterozygous germline mutation in ARMC5 (p.A110fs*9) located on chromosome 16. Analysis of tumor tissue showed different somatic ARMC5 mutations in adrenal nodules supporting a “second hit” hypothesis with inactivation of a tumor suppressor gene. A damaging somatic ARMC5 mutation was also found in a concomitant meningioma (p.R502fs) but not in a pancreatic tumor suggesting biallelic inactivation of ARMC5 as causal also for the intracranial meningioma. Conclusions: Our analysis further confirms inherited inactivating ARMC5 mutations as a cause of familial PMAH and suggests an additional role for the development of concomitant intracranial meningiomas. [less ▲]

Detailed reference viewed: 291 (18 UL)
Full Text
Peer Reviewed
See detailA unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data.
Hu, Hao; Roach, Jared C.; Coon, Hilary et al

in Nature Biotechnology (2014), 32(7), 663-669

High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based ... [more ▼]

High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based sequencing studies rely on an ad hoc combination of suboptimal analyses. Here we present pedigree-VAAST (pVAAST), a disease-gene identification tool designed for high-throughput sequence data in pedigrees. pVAAST uses a sequence-based model to perform variant and gene-based linkage analysis. Linkage information is then combined with functional prediction and rare variant case-control association information in a unified statistical framework. pVAAST outperformed linkage and rare-variant association tests in simulations and identified disease-causing genes from whole-genome sequence data in three human pedigrees with dominant, recessive and de novo inheritance patterns. The approach is robust to incomplete penetrance and locus heterogeneity and is applicable to a wide variety of genetic traits. pVAAST maintains high power across studies of monogenic, high-penetrance phenotypes in a single pedigree to highly polygenic, common phenotypes involving hundreds of pedigrees. [less ▲]

Detailed reference viewed: 113 (2 UL)
Full Text
Peer Reviewed
See detailDiscovering pair-wise genetic interactions: an information theory-based approach.
Ignac, Tomasz UL; Skupin, Alexander UL; Sakhanenko, Nikita A. et al

in PloS one (2014), 9(3), 92310

Phenotypic variation, including that which underlies health and disease in humans, results in part from multiple interactions among both genetic variation and environmental factors. While diseases or ... [more ▼]

Phenotypic variation, including that which underlies health and disease in humans, results in part from multiple interactions among both genetic variation and environmental factors. While diseases or phenotypes caused by single gene variants can be identified by established association methods and family-based approaches, complex phenotypic traits resulting from multi-gene interactions remain very difficult to characterize. Here we describe a new method based on information theory, and demonstrate how it improves on previous approaches to identifying genetic interactions, including both synthetic and modifier kinds of interactions. We apply our measure, called interaction distance, to previously analyzed data sets of yeast sporulation efficiency, lipid related mouse data and several human disease models to characterize the method. We show how the interaction distance can reveal novel gene interaction candidates in experimental and simulated data sets, and outperforms other measures in several circumstances. The method also allows us to optimize case/control sample composition for clinical studies. [less ▲]

Detailed reference viewed: 139 (6 UL)
Full Text
Peer Reviewed
See detailRNA in circulation: sources and functions of extracellular exogenous RNA in blood
Galas, David J. UL; Wilmes, Paul UL; Wang, Kai

in Nelson, Karen (Ed.) Encyclopedia of Metagenomics (2014)

Molecules of many kinds are abundant in circulating blood and play a wide range of important roles, both known and unknown. These include macromolecules like proteins and nucleic acids and a wide range of ... [more ▼]

Molecules of many kinds are abundant in circulating blood and play a wide range of important roles, both known and unknown. These include macromolecules like proteins and nucleic acids and a wide range of smaller molecules. A number of questions are raised by recent findings of stable RNA molecules in plasma that is circulating RNA outside of cells. Among the issues that need to be addressed are: what are the origins of these RNA molecules; what are the mechanisms by which they enter and are stabilized in the blood; what are their possible biological functions; and finally, what are the potential applications of these extracellular RNA molecules in diagnostic and therapeutic medicine? While the precise biological functions remain to be pinned down, extracellular RNA has been proposed as a vehicle for a previously unknown cell-cell communication system. Recent reports of the detection of foreign, exogenous sources of some of the extracellular RNA have thus intensified the need to investigate and understand these processes. This overview summarizes the findings, some recent developments, and the current state of research in the circulating RNA field, and some of the key open questions in the field are specifically addressed. [less ▲]

Detailed reference viewed: 340 (17 UL)
Full Text
Peer Reviewed
See detailAn evaluation of high-throughput approaches to QTL mapping in Saccharomyces cerevisiae
Wilkening, Stefan; Lin, Gen; Fritsch, Emilie S. et al

in Genetics (2014), 196(3), 853-65

Dissecting the molecular basis of quantitative traits is a significant challenge and is essential for understanding complex diseases. Even in model organisms, precisely determining causative genes and ... [more ▼]

Dissecting the molecular basis of quantitative traits is a significant challenge and is essential for understanding complex diseases. Even in model organisms, precisely determining causative genes and their interactions has remained elusive, due in part to difficulty in narrowing intervals to single genes and in detecting epistasis or linked quantitative trait loci. These difficulties are exacerbated by limitations in experimental design, such as low numbers of analyzed individuals or of polymorphisms between parental genomes. We address these challenges by applying three independent high-throughput approaches for QTL mapping to map the genetic variants underlying 11 phenotypes in two genetically distant Saccharomyces cerevisiae strains, namely (1) individual analysis of >700 meiotic segregants, (2) bulk segregant analysis, and (3) reciprocal hemizygosity scanning, a new genome-wide method that we developed. We reveal differences in the performance of each approach and, by combining them, identify eight polymorphic genes that affect eight different phenotypes: colony shape, flocculation, growth on two nonfermentable carbon sources, and resistance to two drugs, salt, and high temperature. Our results demonstrate the power of individual segregant analysis to dissect QTL and address the underestimated contribution of interactions between variants. We also reveal confounding factors like mutations and aneuploidy in pooled approaches, providing valuable lessons for future designs of complex trait mapping studies. [less ▲]

Detailed reference viewed: 108 (2 UL)
Full Text
Peer Reviewed
See detailThe spectrum of circulating RNA: A window into systems toxicology
Wang, Kai; Yuan, Yue; Cho, Ji-Hoon et al

in Toxicological Sciences (2013), 132(2), 478492

Adverse effects caused by therapeutic drugs are a serious and costly health concern. Despite the body’s systemic responses to therapeutics, the liver is often the focus of damage and is usually the focus ... [more ▼]

Adverse effects caused by therapeutic drugs are a serious and costly health concern. Despite the body’s systemic responses to therapeutics, the liver is often the focus of damage and is usually the focus of studies of toxic effects due to its active roles in the metabolism of xenobiotics. It is extremely difficult, however, to assess systemic responses with currently available methods. Comprehensive cataloging of cell-free circulating RNAs using next-generation sequencing technology may open a window to assess drug-associated adverse effects at the systems level. To explore this potential, we conducted an RNA profiling study using the well-characterized acetaminophen overdose mouse model on liver and plasma with microarray and next-generation sequencing platforms, respectively. After drug treatment, the levels of a number of transcripts, both endogenous and exogenous RNAs, showed significant changes in plasma, reflecting not only the classical liver injury induced by acetaminophen overdose but also damage in tissues other than the liver. The changes in exogenous RNAs also reflect alteration on dieting behavior after acetaminophen overdose. Besides reporting an extensive list of circulating RNAbased biomarker candidates, this study illustrates the possibility of using circulating RNAs to assess global effects of therapeutics. This could also lead to a new approach for a more comprehensive assessment of the efficacy and safety of therapeutics. [less ▲]

Detailed reference viewed: 93 (0 UL)
Peer Reviewed
See detailDifferentiated SH-SY5Y Cells as PD Model for Mitochondrial Dysfunction: From Whole Genome Sequencing to an Educated Design of High-Throughput Experiments
Antony, Paul UL; Krishna, Abhimanyu UL; May, Patrick UL et al

Poster (2013)

Objectives: Mitochondrial dysfunction is a cellular hallmark of Parkinson's disease (PD) and energetic stress of dopaminergic neurons appears to be a physiological risk factor for mitochondrial ... [more ▼]

Objectives: Mitochondrial dysfunction is a cellular hallmark of Parkinson's disease (PD) and energetic stress of dopaminergic neurons appears to be a physiological risk factor for mitochondrial dysfunction. It is however challenging to assess the high variety of factors regulating mitochondrial physiology in living neurons in a high throughput manner. To overcome this bottleneck, we established an analysis platform, using the neuroblastoma cell line SH-SY5Y. For the first time ever we have characterized the SH-SY5Y cell line in an integrated whole genome, transcriptome, and proteome approach. In addition, we show that neuronal differentiation improves the physiological properties of this experimental model for studying mitochondrial dysfunction in PD. Methods: Whole genome sequencing, RNA-Seq, qRT-PCR, MS, FRET using Voltage sensing proteins, Immunofluorescence, cytometry, and live cell imaging. Results: The integrated molecular characterization of SH-SY5Y uncovers the level of molecular network integrity and hence the relevance of this cell line for targeted studies in selected molecular processes. Furthermore, we dissect changes in mitochondrial and energetic stress factors during the process of neuronal differentiation. Conclusions: In terms of both morphology and energetic stress response, differentiated SH-SY5Y cells are more similar to dopaminergic neurons than their undifferentiated precursors. Thanks to dividing progenitors and the short duration of differentiation, combined with the use of specific endpoints analysed with high-content microscopy, our platform paves the route for high throughput experiments on a neuronal cell culture model for PD. Our genomic characterization and expression profiling of SH-SY5Y cells furthermore helps guiding the experimental design and interpretation of such studies. [less ▲]

Detailed reference viewed: 553 (55 UL)
Full Text
Peer Reviewed
See detailComparing the MicroRNA Spectrum between Serum and Plasma
Wang, K.; Yuan, Y.; Cho, J. H. et al

in PLoS ONE (2012), 7(7), 41561

MicroRNAs (miRNAs) are small, non-coding RNAs that regulate various biological processes, primarily through interaction with messenger RNAs. The levels of specific, circulating miRNAs in blood have been ... [more ▼]

MicroRNAs (miRNAs) are small, non-coding RNAs that regulate various biological processes, primarily through interaction with messenger RNAs. The levels of specific, circulating miRNAs in blood have been shown to associate with various pathological conditions including cancers. These miRNAs have great potential as biomarkers for various pathophysiological conditions. In this study we focused on different sample types’ effects on the spectrum of circulating miRNA in blood. Using serum and corresponding plasma samples from the same individuals, we observed higher miRNA concentrations in serum samples compared to the corresponding plasma samples. The difference between serum and plasma miRNA concentration showed some associations with miRNA from platelets, which may indicate that the coagulation process may affect the spectrum of extracellular miRNA in blood. Several miRNAs also showed platform dependent variations in measurements. Our results suggest that there are a number of factors that might affect the measurement of circulating miRNA concentration. Caution must be taken when comparing miRNA data generated from different sample types or measurement platforms [less ▲]

Detailed reference viewed: 101 (1 UL)
Full Text
Peer Reviewed
See detailProbabilistic Logic Methods and Some Applications to Biology and Medicine
Sakhanenko, Nikita A.; Galas, David J. UL

in Journal of Computational Biology (2012), 19(3), 316-336

For the computational analysis of biological problems—analyzing data, inferring networks and complex models, and estimating model parameters—it is common to use a range of methods based on probabilistic ... [more ▼]

For the computational analysis of biological problems—analyzing data, inferring networks and complex models, and estimating model parameters—it is common to use a range of methods based on probabilistic logic constructions, sometimes collectively called machine learning methods. Probabilistic modeling methods such as Bayesian Networks (BN) fall into this class, as do Hierarchical Bayesian Networks (HBN), Probabilistic Boolean Networks (PBN), Hidden Markov Models (HMM), and Markov Logic Networks (MLN). In this re- view, we describe the most general of these (MLN), and show how the above-mentioned methods are related to MLN and one another by the imposition of constraints and re- strictions. This approach allows us to illustrate a broad landscape of constructions and methods, and describe some of the attendant strengths, weaknesses, and constraints of many of these methods. We then provide some examples of their applications to problems in biology and medicine, with an emphasis on genetics. The key concepts needed to picture this landscape of methods are the ideas of probabilistic graphical models, the structures of the graphs, and the scope of the logical language repertoire used (from First-Order Logic [FOL] to Boolean logic.) These concepts are interlinked and together define the nature of each of the probabilistic logic methods. Finally, we discuss the initial applications of MLN to ge- netics, show the relationship to less general methods like BN, and then mention several examples where such methods could be effective in new applications to specific biological and medical problems. [less ▲]

Detailed reference viewed: 90 (0 UL)
Full Text
Peer Reviewed
See detailRelations between the set-complexity and the structure of graphs and their sub-graphs
Ignac, Tomasz UL; Sakhanenko, Nikita; Galas, David J. UL

in EURASIP Journal on Bioinformatics and Systems Biology (2012), 13

We describe some new conceptual tools for the rigorous, mathematical description of the “set-complexity” of graphs. This set-complexity has been shown previously to be a useful measure for analyzing some ... [more ▼]

We describe some new conceptual tools for the rigorous, mathematical description of the “set-complexity” of graphs. This set-complexity has been shown previously to be a useful measure for analyzing some biological networks, and in discussing biological information in a quantitative fashion. The advances described here allow us to define some significant relationships between the set-complexity measure and the structure of graphs, and of their component sub-graphs. We show here that modular graph structures tend to maximize the set-complexity of graphs. We point out the relationship between modularity and redundancy, and discuss the significance of set-complexity in this regard. We specifically discuss the relationship between complexity and entropy in the case of complete-bipartite graphs, and present a new method for constructing highly complex, binary graphs. These results can be extended to the case of ternary graphs, and to other multi-edge graphs, which are fundamentally more relevant to biological structures and systems. Finally, our results lead us to an approach for extracting high complexity modular graphs from large, noisy graphs with low information content. We illustrate this approach with two examples. [less ▲]

Detailed reference viewed: 145 (35 UL)
Full Text
Peer Reviewed
See detailRNASEQR—a streamlined and accurate RNA-seq sequence analysis program
Chen, Leslie Y.; Wei, Kuo-Chen; Huang, Abner C.-Y. et al

in Nucleic Acids Research (2012), 40(6), 42-

Next-generation sequencing (NGS) technologiesbased transcriptomic profiling method often called RNA-seq has been widely used to study global gene expression, alternative exon usage, new exon discovery ... [more ▼]

Next-generation sequencing (NGS) technologiesbased transcriptomic profiling method often called RNA-seq has been widely used to study global gene expression, alternative exon usage, new exon discovery, novel transcriptional isoforms and genomic sequence variations. However, this technique also poses many biological and informatics challenges to extracting meaningful biological information. The RNA-seq data analysis is built on the foundation of high quality initial genome localization and alignment information for RNA-seq sequences. Toward this goal, we have developed RNASEQR to accurately and effectively map millions of RNA-seq sequences. We have systematically compared RNASEQR with four of the most widely used tools using a simulated data set created from the Consensus CDS project and two experimental RNA-seq data sets generated from a human glioblastoma patient. Our results showed that RNASEQR yields more accurate estimates for gene expression, complete gene structures and new transcript isoforms, as well as more accurate detection of single nucleotide variants (SNVs). RNASEQR analyzes raw data from RNA-seq experiments effectively and outputs results in a manner that is compatible with a wide variety of specialized downstream analyses on desktop computers. [less ▲]

Detailed reference viewed: 100 (0 UL)
Full Text
Peer Reviewed
See detailComplexity of Networks II: The Set Complexity of Edge-Colored Graphs
Ignac, Tomasz UL; Sakhanenko, N. A.; Galas, David J. UL

in Complexity (2012), 17(5), 23-36

We previously introduced the concept of “set-complexity”, based on a context-dependent measure of information, and used this concept to describe the complexity of gene interaction networks. In the ... [more ▼]

We previously introduced the concept of “set-complexity”, based on a context-dependent measure of information, and used this concept to describe the complexity of gene interaction networks. In the previous paper in this series we analyzed the set-complexity of binary graphs. Here we extend this analysis to graphs with multi-colored edges that more closely match biological structures like the gene interaction networks. All highly complex graphs by this measure exhibit a modular structure. A principal result of this work is that for the most complex graphs of a given size the number of edge colors is equal to the number of “modules” of the graph. Complete multipartite graphs (CMGs) are defined and analyzed, and the relation between complexity and structure of these graphs is examined in detail. We establish that the mutual information between any two nodes in a CMG can be fully expressed in terms of entropy, and present an explicit expression for the set complexity of CMGs (Theorem 3). An algorithm for generating highly complex graphs from CMGs is described. We establish several theorems relating these concepts and connecting complex graphs with a variety of practical network properties. In exploring the relation between symmetry and complexity we use the idea of a similarity matrix and its spectrum for highly complex graphs. [less ▲]

Detailed reference viewed: 109 (2 UL)
Full Text
Peer Reviewed
See detailNew methods for finding associations in large data sets: Generalizing the maximal information coefficient (MIC)
Ignac, Tomasz UL; Sakhanenko, N. A.; Skupin, Alexander UL et al

in Proceedings of the Ninth International Workshop on Computational Systems Biology (2012)

We propose here a natural, but substantive, extension of the MIC. Defined for two variables, MIC has a distinct advance for detecting potentially complex dependencies. Our extension provides a similar ... [more ▼]

We propose here a natural, but substantive, extension of the MIC. Defined for two variables, MIC has a distinct advance for detecting potentially complex dependencies. Our extension provides a similar means for dependencies among three variables. This itself is an important step for practical applications. We show that by merging two concepts, the interaction information, which is a generalization of the mutual information to three variables, and the normalized information distance, which measures informational sharing between two variables, we can extend the fundamental idea of MIC. Our results also exhibit some attractive properties that should be useful for practical applications in data analysis. Finally, the conceptual and mathematical framework presented here can be used to generalize the idea of MIC to the multi-variable case. [less ▲]

Detailed reference viewed: 246 (9 UL)