![]() Glaab, Enrico ![]() in PLoS ONE (2012), 7(7), 39932-39932 Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find ... [more ▼] Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL’s classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes. [less ▲] Detailed reference viewed: 156 (5 UL)![]() ; ; Glaab, Enrico ![]() in Proceedings of the National Academy of Sciences of the United States of America (2011), 108(23), 9709-9714 Detailed reference viewed: 149 (4 UL)![]() ; ; Glaab, Enrico ![]() in Breast Cancer Research and Treatment (2011), 128(2), 315-326 Global gene expression profiling studies have classified breast cancer into a number of distinct biological and molecular classes with clinical relevance. The heterogeneous luminal group, which is largely ... [more ▼] Global gene expression profiling studies have classified breast cancer into a number of distinct biological and molecular classes with clinical relevance. The heterogeneous luminal group, which is largely characterised by oestrogen receptor (ER) expression, appears to contain distinct subgroups with differing behaviour. In this study, we analysed 47,293 gene transcripts in 128 invasive breast carcinomas (BC) using Artificial Neural Networks and a cross-validation analysis in combination with an ensemble sample classification to identify genes that can be used to subclassify ER+ luminal tumours. The results were validated using immunohistochemistry on TMAs containing 1,140 invasive breast cancers. Our results showed that the RERG gene is one of the highest ranked genes to differentiate between ER+ luminal-like and ER- non-luminal cancers based on a 10-fold external cross-validation analysis with an average classification accuracy of 89%. This was confirmed in our protein expression studies that showed RERG positive associations with markers of luminal differentiation including ER, luminal cytokeratins (CK19, CK18 and CK7/8) and FOXA1 (P = 0.004) and other markers of good prognosis in BC including small size, lower histologic grade and positive expression of androgen receptor, nuclear BRCA1, FHIT and cell cycle inhibitors p27 and p21. RERG expression was inversely associated with the proliferation marker MIB1 (P = 0.005) and p53. Strong RERG expression showed an association with longer breast cancer specific survival and distant metastasis free interval in the whole series as well as in the ER+ luminal group and these associations were independent of other prognostic variables. In conclusion, we used novel bioinformatics methods to identify candidate genes to characterise ER+ luminal-like breast cancer. RERG gene is a key marker of the luminal BC class and can be used to separate distinct prognostic subgroups. [less ▲] Detailed reference viewed: 131 (5 UL)![]() Glaab, Enrico ![]() in Journal of Statistical Software (2010), 36(8), 1-18 The 3-dimensional representation and inspection of complex data is a frequently used strategy in many data analysis domains. Existing data mining software often lacks functionality that would enable users ... [more ▼] The 3-dimensional representation and inspection of complex data is a frequently used strategy in many data analysis domains. Existing data mining software often lacks functionality that would enable users to explore 3D data interactively, especially if one wishes to make dynamic graphical representations directly viewable on the web. In this paper we present vrmlgen, a software package for the statistical programming language R to create 3D data visualizations in web formats like the Virtual Reality Markup Language (VRML) and LiveGraphics3D. vrmlgen can be used to generate 3D charts and bar plots, scatter plots with density estimation contour surfaces, and visualizations of height maps, 3D object models and parametric functions. For greater flexibility, the user can also access low-level plotting methods through a unified interface and freely group different function calls together to create new higher-level plotting methods. Additionally, we present a web tool allowing users to visualize 3D data online and test some of vrmlgen's features without the need to install any software on their computer. [less ▲] Detailed reference viewed: 206 (5 UL)![]() Glaab, Enrico ![]() in Bioinformatics (2010), 26(9), 1271-1272 TopoGSA (Topology-based Gene Set Analysis) is a web-application dedicated to the computation and visualization of network topological properties for gene and protein sets in molecular interaction networks ... [more ▼] TopoGSA (Topology-based Gene Set Analysis) is a web-application dedicated to the computation and visualization of network topological properties for gene and protein sets in molecular interaction networks. Different topological characteristics, such as the centrality of nodes in the network or their tendency to form clusters, can be computed and compared with those of known cellular pathways and processes. [less ▲] Detailed reference viewed: 132 (4 UL)![]() Glaab, Enrico ![]() in BMC Bioinformatics (2010), 11(1), 597-597 Detailed reference viewed: 122 (2 UL) |
||