References of "Jensen, Lars Juhl"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailDISEASES: text mining and data integration of disease-gene associations.
Pletscher-Frankild, Sune; Palleja, Albert; Tsafou, Kalliopi et al

in Methods (2015), 74

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The ... [more ▼]

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease-gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download. [less ▲]

Detailed reference viewed: 116 (6 UL)
Full Text
Peer Reviewed
See detailCOMPARTMENTS: unification and visualization of protein subcellular localization evidence.
Binder, Janos X. UL; Pletscher-Frankild, Sune; Tsafou, Kalliopi et al

in Database: the Journal of Biological Databases and Curation (2014), 2014

Information on protein subcellular localization is important to understand the cellular functions of proteins. Currently, such information is manually curated from the literature, obtained from high ... [more ▼]

Information on protein subcellular localization is important to understand the cellular functions of proteins. Currently, such information is manually curated from the literature, obtained from high-throughput microscopy-based screens and predicted from primary sequence. To get a comprehensive view of the localization of a protein, it is thus necessary to consult multiple databases and prediction tools. To address this, we present the COMPARTMENTS resource, which integrates all sources listed above as well as the results of automatic text mining. The resource is automatically kept up to date with source databases, and all localization evidence is mapped onto common protein identifiers and Gene Ontology terms. We further assign confidence scores to the localization evidence to facilitate comparison of different types and sources of evidence. To further improve the comparability, we assign confidence scores based on the type and source of the localization evidence. Finally, we visualize the unified localization evidence for a protein on a schematic cell to provide a simple overview. Database URL: http://compartments.jensenlab.org. [less ▲]

Detailed reference viewed: 151 (2 UL)
Full Text
Peer Reviewed
See detailRecalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse.
Orlando, Ludovic; Ginolhac, Aurélien UL; Zhang, Guojie et al

in Nature (2013), 499(7456), 74-8

The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560 ... [more ▼]

The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr BP). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude. For comparison, we sequenced the genome of a Late Pleistocene horse (43 kyr BP), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalski's horse (E. f. przewalskii) and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0-4.5 million years before present (Myr BP), twice the conventionally accepted time to the most recent common ancestor of the genus Equus. We also find that horse population size fluctuated multiple times over the past 2 Myr, particularly during periods of severe climatic changes. We estimate that the Przewalski's and domestic horse populations diverged 38-72 kyr BP, and find no evidence of recent admixture between the domestic horse breeds and the Przewalski's horse investigated. This supports the contention that Przewalski's horses represent the last surviving wild horse population. We find similar levels of genetic variation among Przewalski's and domestic populations, indicating that the former are genetically viable and worthy of conservation efforts. We also find evidence for continuous selection on the immune system and olfaction throughout horse evolution. Finally, we identify 29 genomic regions among horse breeds that deviate from neutrality and show low levels of genetic variation compared to the Przewalski's horse. Such regions could correspond to loci selected early during domestication. [less ▲]

Detailed reference viewed: 172 (28 UL)
Full Text
Peer Reviewed
See detailOnTheFly 2.0: a tool for automatic annotation of files and biological information extraction.
Pafilis, Evangelos; Pavlopoulos, Georgios; Satagopam, Venkata UL et al

Poster (2013)

Retrieving all of the necessary information from databases about bioentities mentioned in an article is not a trivial or an easy task. Following the daily literature about a specific biological topic and ... [more ▼]

Retrieving all of the necessary information from databases about bioentities mentioned in an article is not a trivial or an easy task. Following the daily literature about a specific biological topic and collecting all the necessary information about the bioentities mentioned in the literature manually is tedious and time consuming. OnTheFly 2.0 is a web application mainly designed for non-computer experts which aims to automate data collection and knowledge extraction from biological literature in a user friendly and efficient way. OnTheFly 2.0 is able to extract bioentities from individual articles such as text, Microsoft Word, Excel and PDF files. With a simple drag-and-drop motion, the text of a document is extensively parsed for bioentities such as protein/gene names and chemical compound names. Utilizing high quality data integration platforms, OnTheFly allows the generation of informative summaries, interaction networks and at-a-glance popup windows containing knowledge related to the bioentities found in documents. OnTheFly 2.0 provides a concise application to automate the extraction of bioentities hidden in various documents and is offered as a web based application. [less ▲]

Detailed reference viewed: 113 (1 UL)
Full Text
Peer Reviewed
See detailSuperTarget and Matador: resources for exploring drug-target relationships
Guenther, Stefan; Kuhn, Michael; Dunkel, Mathias et al

in Nucleic Acids Research (2008), 36(SI), 919-922

The molecular basis of drug action is often not well understood. This is partly because the very abundant and diverse information generated in the past decades on drugs is hidden in millions of medical ... [more ▼]

The molecular basis of drug action is often not well understood. This is partly because the very abundant and diverse information generated in the past decades on drugs is hidden in millions of medical articles or textbooks. Therefore, we developed a one-stop data warehouse, SuperTarget that integrates drug-related information about medical indication areas, adverse drug effects, drug metabolization, pathways and Gene Ontology terms of the target proteins. An easy-to-use query interface enables the user to pose complex queries, for example to find drugs that target a certain pathway, interacting drugs that are metabolized by the same cytochrome P450 or drugs that target the same protein but are metabolized by different enzymes. Furthermore, we provide tools for 2D drug screening and sequence comparison of the targets. The database contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs; the vast majority of entries have pointers to the respective literature source. A subset of these drugs has been annotated with additional binding information and indirect interactions and is available as a separate resource called Matador. SuperTarget and Matador are available at http://insilico.charite.de/supertarget and http://matador.embl.de. [less ▲]

Detailed reference viewed: 159 (0 UL)