Keyphrase extraction from single textual documents based on semantically defined background knowledge and co-occurrence graphs

DALLE LUCCA TOSI, Mauro; Reis, Julio Cesar Dos

Article (Scientific journals)

DALLE LUCCA TOSI, Mauro; Reis, Julio Cesar Dos

2021 • In International Journal of Metadata, Semantics and Ontologies, 15 (2), p. 121--132

Peer reviewed

Permalink
https://hdl.handle.net/10993/52022

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

2021_IJMSO.pdf

Publisher postprint (539.08 kB)

Request a copy

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Abstract :

[en] The keyphrase extraction task is a fundamental and challenging task designed to automatically extract a set of keyphrases from textual documents. Keyphrases are fundamental to assist publishers in indexing documents and readers in identifying the most relevant ones. They are short phrases composed of one or more terms used to best represent a textual document and its main topics. In this article, we extend our research on C-Rank, an unsupervised approach that automatically extracts keyphrases from single documents. C-Rank uses a concept-linking approach that links concepts in common between single documents and an external background knowledge base. Our approach uses those concepts as candidate keyphrases, which are modeled in a co-occurrence graph. On this basis, keyphrases are extracted relying on heuristics and their centrality in the graph. We advance our study over C-Rank by evaluating it using different concept-linking approaches - Babelfy and DBPedia Spotlight. The evaluation was performed in five gold-standard datasets composed of distinct types of data - academic articles, academic abstracts, and news articles. Our findings indicate that C-Rank achieves state-of-the-art results extracting keyphrases from scientific documents by experimentally comparing it to other unsupervised existing approaches.

Disciplines :

Computer science

Author, co-author :

DALLE LUCCA TOSI, Mauro ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

Reis, Julio Cesar Dos

External co-authors :

yes

Language :

English

Title :

Keyphrase extraction from single textual documents based on semantically defined background knowledge and co-occurrence graphs

Publication date :

2021

Journal title :

International Journal of Metadata, Semantics and Ontologies

ISSN :

1744-2621

eISSN :

1744-263X

Publisher :

Inderscience Publishers (IEL)

Volume :

Issue :

Pages :

121--132

Peer reviewed :

Peer reviewed

Available on ORBilu :

since 06 September 2022

Statistics

Number of views

177 (3 by Unilu)

Number of downloads

0 (0 by Unilu)

More statistics