Búsqueda avanzada

NASS: A Semantic Annotation Tool for Media


Nowadays media companies have serious difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional difficulty due to the usual large size of the list of words in a thesaurus, which is the typical tool used to tag news in the media. In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We carried out a detailed evaluation of our method with a real set of changing news and we compared out tagging with the annotation performed by a real documentation department, obtaining really promising results.

Palabras Clave:

Semantic tagging and classification; Information Extraction; NLP; SVM; Ontologies; Text classification; Media; News





Este artículo tiene una licencia de uso CreativeCommons - Reconocimiento (by)

Descarga el artículo haciendo click aquí.

Ver la referencia en formato Bibtex