References of "Vega Moreno, Carlos Gonzalo 50033310"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailFrom Hume to Wuhan: An Epistemological Journey on the Problem of Induction in COVID-19 Machine Learning Models and its Impact Upon Medical Research
Vega Moreno, Carlos Gonzalo UL

in IEEE Access (2021), 9

Advances in computer science have transformed the way artificial intelligence is employed in academia, with Machine Learning (ML) methods easily available to researchers from diverse areas thanks to ... [more ▼]

Advances in computer science have transformed the way artificial intelligence is employed in academia, with Machine Learning (ML) methods easily available to researchers from diverse areas thanks to intuitive frameworks that yield extraordinary results. Notwithstanding, current trends in the mainstream ML community tend to emphasise <italic>wins</italic> over knowledge, putting the scientific method aside, and focusing on maximising metrics of interest. Methodological flaws lead to poor justification of method choice, which in turn leads to disregard the limitations of the methods employed, ultimately putting at risk the translation of solutions into real-world clinical settings. This work exemplifies the impact of the problem of induction in medical research, studying the methodological issues of recent solutions for computer-aided diagnosis of COVID-19 from chest X-Ray images. [less ▲]

Detailed reference viewed: 86 (2 UL)
Full Text
See detailSupporting findability of COVID-19 research with large-scale text mining of scientific publications
Welter, Danielle UL; Vega Moreno, Carlos Gonzalo UL; Biryukov, Maria UL et al

Poster (2020, November 27)

When the COVID-19 pandemic hit in early 2020, a lot of research efforts were quickly redirected towards studies on SARS-CoV2 and COVID-19 disease, from the sequencing and assembly of viral genomes to the ... [more ▼]

When the COVID-19 pandemic hit in early 2020, a lot of research efforts were quickly redirected towards studies on SARS-CoV2 and COVID-19 disease, from the sequencing and assembly of viral genomes to the elaboration of robust testing methodologies and the development of treatment and vaccination strategies. At the same time, a flurry of scientific publications around SARS-CoV-2 and COVID-19 began to appear, making it increasingly difficult for researchers to stay up-to-date with latest trends and developments in this rapidly evolving field. The BioKB platform is a pipeline which, by exploiting text mining and semantic technologies, helps researchers easily access semantic content of thousands of abstracts and full text articles. The content of the articles is analysed and concepts from a range of contexts, including proteins, species, chemicals, diseases and biological processes are tagged based on existing dictionaries of controlled terms. Co-occurring concepts are classified based on their asserted relationship and the resulting subject-relation-object triples are stored in a publicly accessible human- and machine-readable knowledge base. All concepts in the BioKB dictionaries are linked to stable, persistent identifiers, either a resource accession such as an Ensembl, Uniprot or PubChem ID for genes, proteins and chemicals, or an ontology term ID for diseases, phenotypes and other ontology terms. In order to improve COVID-19 related text mining, we extended the underlying dictionaries to include many additional viral species (via NCBI Taxonomy identifiers), phenotypes from the Human Phenotype Ontology (HPO), COVID-related concepts including clinical and laboratory tests from the COVID-19 ontology, as well as additional diseases (DO) and biological processes (GO). We also added all viral proteins found in UniProt and gene entries from EntrezGene to increase the sensitivity of the text mining pipeline to viral data. To date, BioKB has indexed over 270’000 sentences from 21’935 publications relating to coronavirus infections, with publications dating from 1963 to 2021, 3’863 of which were published this year. We are currently working to further refine the text mining pipeline by training it on the extraction of increasingly complex relations such as protein-phenotype relationships. We are also regularly adding new terms to our dictionaries for areas where coverage is currently low, such as clinical and laboratory tests and procedures and novel drug treatments. [less ▲]

Detailed reference viewed: 107 (12 UL)
Full Text
See detailBioKC: a platform for quality controlled curation and annotation of systems biology models
Vega Moreno, Carlos Gonzalo UL; Groues, Valentin UL; Ostaszewski, Marek UL et al

Scientific Conference (2020, September 04)

Standardisation of biomedical knowledge into systems biology models is essential for the study of the biological function. However, biomedical knowledge curation is a laborious manual process aggravated ... [more ▼]

Standardisation of biomedical knowledge into systems biology models is essential for the study of the biological function. However, biomedical knowledge curation is a laborious manual process aggravated by the ever increasing growth of biomedical literature. High quality curation currently relies on pathway databases where outsider participation is minimal. The increasing demand of systems biology knowledge presents new challenges regarding curation, calling for new collaborative functionalities to improve quality control of the review process. These features are missing in the current systems biology environment, whose tools are not well suited for an open community-based model curation workflow. On one hand, diagram editors such as CellDesigner or Newt provide limited annotation features. On the other hand, most popular text annotations tools are not aimed for biomedical text annotation or model curation. Detaching the model curation and annotation tasks from diagram editing improves model iteration and centralizes the annotation of such models with supporting evidence. In this vain, we present BioKC, a web-based platform for systematic quality-controlled collaborative curation and annotation of biomedical knowledge following the standard data model from Systems Biology Markup Language (SBML). [less ▲]

Detailed reference viewed: 109 (6 UL)
Full Text
See detailBioKC: a collaborative platform for systems biology model curation and annotation
Vega Moreno, Carlos Gonzalo UL; Groues, Valentin UL; Ostaszewski, Marek UL et al

in bioRxiv (2020)

Curation of biomedical knowledge into standardised and inter-operable systems biology models is essential for studying complex biological processes. However, systems-level curation is a laborious manual ... [more ▼]

Curation of biomedical knowledge into standardised and inter-operable systems biology models is essential for studying complex biological processes. However, systems-level curation is a laborious manual process, especially when facing ever increasing growth of domain literature. Currently, these systems-level curation efforts concentrate around dedicated pathway databases, with a limited input from the research community. The demand for systems biology knowledge increases with new findings demonstrating elaborate relationships between multiple molecules, pathways and cells. This new challenge calls for novel collaborative tools and platforms allowing to improve the quality and the output of the curation process. In particular, in the current systems biology environment, curation tools lack reviewing features and are not well suited for an open, community-based curation workflows. An important concern is the complexity of the curation process and the limitations of the tools supporting it. Currently, systems-level curation combines model-building with diagram layout design. However, diagram editing tools offer limited annotation features. On the other hand, text-oriented tools have insufficient capabilities representing and annotating relationships between biological entities. Separating model curation and annotation from diagram editing enables iterative and distributed building of annotated models. Here, we present BioKC (Biological Knowledge Curation), a web-based collaborative platform for the curation and annotation of biomedical knowledge following the standard data model from Systems Biology Markup Language (SBML).Competing Interest StatementThe authors have declared no competing interest. [less ▲]

Detailed reference viewed: 189 (9 UL)
Full Text
Peer Reviewed
See detailDiluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis
Vega Moreno, Carlos Gonzalo UL; Zazo, José Fernando; Meyer, Hugo et al

in Vega Moreno, Carlos Gonzalo (Ed.) 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (2018, February 15)

Traditional data centers are designed with a rigid architecture of fit-for-purpose servers that provision resources beyond the average workload in order to deal with occasional peaks of data ... [more ▼]

Traditional data centers are designed with a rigid architecture of fit-for-purpose servers that provision resources beyond the average workload in order to deal with occasional peaks of data. Heterogeneous data centers are pushing towards more cost-efficient architectures with better resource provisioning. In this paper we study the feasibility of using disaggregated architectures for intensive data applications, in contrast to the monolithic approach of server-oriented architectures. Particularly, we have tested a proactive network analysis system in which the workload demands are highly variable. In the context of the dReDBox disaggregated architecture, the results show that the overhead caused by using remote memory resources is significant, between 66% and 80%, but we have also observed that the memory usage is one order of magnitude higher for the stress case with respect to average workloads. Therefore, dimensioning memory for the worst case in conventional systems will result in a notable waste of resources. Finally, we found that, for the selected use case, parallelism is limited by memory. Therefore, using a disaggregated architecture will allow for increased parallelism, which, at the same time, will mitigate the overhead caused by remote memory. [less ▲]

Detailed reference viewed: 92 (3 UL)
Peer Reviewed
See detailKISS Methodologies for Network Management and Anomaly Detection
Vega Moreno, Carlos Gonzalo UL; Aracil, Javier; Magaña, Eduardo

in Vega Moreno, Carlos Gonzalo; Aracil, Javier; Magaña, Eduardo (Eds.) KISS Methodologies for Network Management and Anomaly Detection (2018)

Current networks are increasingly growing in size, complexity and the amount of monitoring data that they produce, which requires complex data analysis pipelines to handle data collection, centralization ... [more ▼]

Current networks are increasingly growing in size, complexity and the amount of monitoring data that they produce, which requires complex data analysis pipelines to handle data collection, centralization and analysis tasks. Literature approaches, include the use of custom agents to harvest information and large data centralization systems based on clusters to achieve horizontal scalability, which are expensive and difficult to deploy in real scenarios. In this paper we propose and evaluate a series of methodologies, deployed in real industrial production environments, for network management, from the architecture design to the visualization system as well as for the anomaly detection methodologies, that intend to squeeze the vertical resources and overcome the difficulties of data collection and centralization. [less ▲]

Detailed reference viewed: 82 (0 UL)
Peer Reviewed
See detailOn the design and performance evaluation of automatic traffic report generation systems with huge data volumes
Vega Moreno, Carlos Gonzalo UL; Miravalls Sierra, Eduardo; Julián Moreno, Guillermo et al

in International Journal of Network Management (2018), 28(6), 2044

Summary In this paper, we analyze the performance issues involved in the generation of automated traffic reports for large IT infrastructures. Such reports allow the IT manager to proactively detect ... [more ▼]

Summary In this paper, we analyze the performance issues involved in the generation of automated traffic reports for large IT infrastructures. Such reports allow the IT manager to proactively detect possible abnormal situations and roll out the corresponding corrective actions. With the ever-increasing bandwidth of current networks, the design of automated traffic report generation systems is very challenging. In a first step, the huge volumes of collected traffic are transformed into enriched flow records obtained from diverse collectors and dissectors. Then, such flow records, along with time series obtained from the raw traffic, are further processed to produce a usable report. As will be shown, the data volume in flow records turns out to be very large as well and requires careful selection of the key performance indicators (KPIs) to be included in the report. In this regard, we discuss the use of high-level languages versus low-level approaches, in terms of speed and versatility. Furthermore, our design approach is targeted for rapid development in commodity hardware, which is essential to cost-effectively tackle demanding traffic analysis scenarios. Actually, the paper shows feasibility of delivering a large number of KPIs, as will be detailed later, for several TBytes of traffic per day using a commodity hardware architecture and high-level languages. [less ▲]

Detailed reference viewed: 78 (2 UL)
Full Text
Peer Reviewed
See detailMulti-Gbps HTTP traffic analysis in commodity hardware based on local knowledge of TCP streams
Vega Moreno, Carlos Gonzalo UL; Roquero, Paula; Aracil, Javier

in Computer Networks (2017), 113

In this paper we propose and implement novel techniques for performance evaluation of web traffic (response time, response code, etc.), with no reassembly of the underlying TCP connection, which severely ... [more ▼]

In this paper we propose and implement novel techniques for performance evaluation of web traffic (response time, response code, etc.), with no reassembly of the underlying TCP connection, which severely restricts the traffic analysis throughput. Furthermore, our proposed software for HTTP traffic analysis runs in standard hardware, which is very cost-effective. Besides, we present sub-TCP connection load balancing techniques that significantly increase throughput at the expense of losing very few HTTP transactions. Such techniques provide performance evaluation statistics which are indistinguishable from the single-threaded alternative with full TCP connection reassembly. © 2017 Elsevier B.V. [less ▲]

Detailed reference viewed: 69 (0 UL)
Full Text
Peer Reviewed
See detailLoginson: a transform and load system for very large-scale log analysis in large IT infrastructures
Vega Moreno, Carlos Gonzalo UL; Roquero, Paula; Leira, Rafael et al

in Journal of Supercomputing (2017), 73(9), 3879-3900

Nowadays, most systems and applications produce log records that are useful for security and monitoring purposes such as debugging programming errors, checking system status, and detecting configuration ... [more ▼]

Nowadays, most systems and applications produce log records that are useful for security and monitoring purposes such as debugging programming errors, checking system status, and detecting configuration problems or even attacks. To this end, a log repository becomes necessary whereby logs can be accessed and visualized in a timely manner. This paper presents Loginson, a high-performance log centralization system for large-scale log collection and processing in large IT infrastructures. Besides log collection, Loginson provides high-level analytics through a visual interface for the purpose of troubleshooting critical incidents. We note that Loginson outperforms all of the other log centralization solutions by taking full advantage of the vertical scalability, and therefore decreasing Capital Expenditure (CAPEX) and Operating Expense (OPEX) costs for deployment scenarios with a huge volume of log data. © 2017, Springer Science+Business Media New York. [less ▲]

Detailed reference viewed: 81 (1 UL)