Resumen:
Nowadays media companies have serious difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional difficulty due to the usual large size of the list of words in a thesaurus, which is the typical tool used to tag news in the media. In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We carried out a detailed evaluation of our method with a real set of changing news and we compared out tagging with the annotation performed by a real documentation department, obtaining really promising results.