Thèse de doctorat (Mémoires et thèses)
In Silico prediction of transcription factor binding sites by probabilistic models
WIENECKE-BALDACCHINO, Anke
2012
 

Documents


Texte intégral
Wienecke - Thesis.pdf
Postprint Auteur (33.64 MB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
UniPROBE; PWM; Chow-Liu Trees; Ensemble of Trees; Regulatory SNPs; Family Trio Data; Differential Binding Detection; Machine Learning
Résumé :
[en] The characterization of in silico detected transcription factor binding sites represents a fundamental problem in the field of regulatory gene expression analysis. Several approaches have been proposed to model DNA-protein-interactions, composed by two main classes: qualitative models considering a consensus sequence and quantitative models providing a measure of binding affinity. The latter can be further subdivided in models assuming an independent contribution of the nucleotides forming a potential binding site and more flexible ones implicating a positional interdependence. In this work the applicability of three probabilistic models to predict transcription factor binding sites has been investigated: (i) the simple position weight matrix (PWM), assuming independence, and two flexible models capturing positional interdependencies represented by a (ii) Chow-Liu Tree and (iii) Ensemble of Trees model. The training and validation of the models on the Mus musculus subset of the UniPROBE database revealed that complex models provide a better predictive power suggesting a high amount of transcription factors binding motifs being affected by positional interdependencies. Additionally, numerous transcription factors were detected, for which the Ensemble of Trees model outperformed both, the Chow-Liu Tree and PWM model. The UniPROBE-based trained models have been applied in a biological context - the prediction of differential binding profiles in five different ChIP-seq samples, followed by the detection of causative regulatory SNPs. The chosen set-up involved family trio data, meaning genotype data from a family composed of father, mother and daughter, providing internal validation. The models provide strong power to correctly classify true negatives in an independent biological sample, represented by a high specificity. The applied approach to detect causative regulatory SNPs, resulted in a candidate list of 20 SNPs. Those gain strong support by epigenetic markers and both, model-based predicted binding affinity of the comprising binding site and significant p-values, describing the effect of the nucleotide exchange.
Disciplines :
Biochimie, biophysique & biologie moléculaire
Auteur, co-auteur :
WIENECKE-BALDACCHINO, Anke ;  University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Life Science Research Unit
Langue du document :
Anglais
Titre :
In Silico prediction of transcription factor binding sites by probabilistic models
Date de soutenance :
28 septembre 2012
Institution :
Unilu - University of Luxembourg, Luxembourg, Luxembourg
Intitulé du diplôme :
Docteur en Biologie
Promoteur :
Disponible sur ORBilu :
depuis le 11 février 2014

Statistiques


Nombre de vues
229 (dont 15 Unilu)
Nombre de téléchargements
494 (dont 12 Unilu)

Bibliographie


Publications similaires



Contacter ORBilu