Reference : Diversity Preserving Genetic Algorithms - Application to the Inverted Folding Problem...
Dissertations and theses : Doctoral thesis
Engineering, computing & technology : Computer science
Systems Biomedicine
Diversity Preserving Genetic Algorithms - Application to the Inverted Folding Problem and Analogous Formulated Benchmarks
Nielsen, Sune Steinbjorn mailto [University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC) >]
University of Luxembourg, ​​Luxembourg
Docteur de l’Université du Luxembourg en Informatique
Bouvry, Pascal mailto
Schneider, Reinhard mailto
Talbi, El-Ghazali mailto
Danoy, Grégoire mailto
Jurkowski, Wiktor mailto
[en] Genetic Algorithms ; Inverted Folding Problem ; Diversity Preservation
[en] Protein structure prediction is an essential step in understanding the molecular mechanisms of living cells with widespread applications in biotechnology and health.
Among the open problems in the field, the Inverse Folding Problem (IFP) that consists in finding sequences that fold into a defined structure is, in itself, an important research problem at the heart of most rational protein design approaches. In brief, solutions to the IFP are protein sequences that will fold into a given protein structure, contrary to conventional structure prediction where the solution consists of the structure into which a given sequence folds. This inverse approach is viewed as a simplification due to the fact that the near infinite number of structure conformations of a protein can be disregarded, and only sequence to structure compatibility needs to be determined. Additional emphasis has been put on the generation of many sequences dissimilar from the known reference sequence instead of finding only one solution. To solve the IFP computationally, a novel formulation of the problem was proposed in which possible problem solutions are evaluated in terms of their predicted secondary structure match. In addition, two specialised Genetic Algorithms (GAs) were developed specifically for solving the IFP problem and compared with existing algorithms in terms of performance. Experimental results outlined the superior performance of the developed algorithms, both in terms of model score and diversity of the generated sets of problem solutions, i.e. new protein sequences. A number of landscape analysis experiments were conducted on the IFP model, enabling the development of an original benchmark suite of analogous problems. These benchmarks were shown to share many characteristics with their IFP model counterparts, but are executable in a fraction of the time. To validate the IFP model and the algorithm output, a subset of the generated solutions were selected for further inspection through full tertiary structure prediction and comparison to the original protein structure. Congruence was then assessed by super-positioning and secondary structure annotation statistics. The results demonstrated that an optimisation process relying on a fast secondary structure approximation, such as the IFP model, permits to obtain meaningful sequences.
University of Luxembourg: High Performance Computing - ULHPC

File(s) associated to this reference

Fulltext file(s):

Open access
Thesis_Sune_S_Nielsen_24_02_2016_updated_17_08_2016_final.pdfAuthor preprint6.83 MBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.