Article (Scientific journals)
Biological data analysis as an information theory problem: multivariable dependence measures and the shadows algorithm
Sakhanenko, Nikita A.; Galas, David J.
2015In Journal of Computational Biology, 22 (11), p. 1005-1024
Peer Reviewed verified by ORBi
 

Files


Full Text
Biological Data Analysis as an Information Theory Problem Multivariable Dependence Measures and the Shadows Algorithm.pdf
Publisher postprint (2.82 MB)
Request a copy

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
discovery; entropy; gene network; interaction information; multivariable dependency
Abstract :
[en] Information theory is valuable in multiple-variable analysis for being model-free and nonparametric, and for the modest sensitivity to undersampling. We previously introduced a general approach to finding multiple dependencies that provides accurate measures of levels of dependency for subsets of variables in a data set, which is significantly nonzero only if the subset of variables is collectively dependent. This is useful, however, only if we can avoid a combinatorial explosion of calculations for increasing numbers of variables. The proposed dependence measure for a subset of variables, tau, differential interaction information, Delta(tau), has the property that for subsets of tau some of the factors of Delta(tau) are significantly nonzero, when the full dependence includes more variables. We use this property to suppress the combinatorial explosion by following the "shadows" of multivariable dependency on smaller subsets. Rather than calculating the marginal entropies of all subsets at each degree level, we need to consider only calculations for subsets of variables with appropriate "shadows." The number of calculations for n variables at a degree level of d grows therefore, at a much smaller rate than the binomial coefficient (n, d), but depends on the parameters of the "shadows" calculation. This approach, avoiding a combinatorial explosion, enables the use of our multivariable measures on very large data sets. We demonstrate this method on simulated data sets, and characterize the effects of noise and sample numbers. In addition, we analyze a data set of a few thousand mutant yeast strains interacting with a few thousand chemical compounds.
Research center :
Luxembourg Centre for Systems Biomedicine (LCSB): Experimental Neurobiology (Balling Group)
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
Sakhanenko, Nikita A.
Galas, David J. ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB)
External co-authors :
yes
Language :
English
Title :
Biological data analysis as an information theory problem: multivariable dependence measures and the shadows algorithm
Publication date :
2015
Journal title :
Journal of Computational Biology
ISSN :
1557-8666
Publisher :
Mary Ann Liebert, Inc.
Volume :
22
Issue :
11
Pages :
1005-1024
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBilu :
since 08 April 2016

Statistics


Number of views
64 (8 by Unilu)
Number of downloads
2 (2 by Unilu)

Scopus citations®
 
22
Scopus citations®
without self-citations
10
OpenCitations
 
24
WoS citations
 
20

Bibliography


Similar publications



Contact ORBilu