TAPON: a two-phase machine learning approach for semantic labelling





Publicado en

Actas de las XXIV Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2019)



Through semantic labelling we enrich structured information from sources such as HTML pages, tables, or JSON files, with labels to integrate it into a local ontology. This process involves measuring some features of the information and then finding the classes that best describe it. The problem with current techniques is that they do not model relationships between classes. Their features fall short when some classes have very similar structures or textual formats. In order to deal with this problem, we have devised TAPON: a new semantic labelling technique that computes novel features that take into account the relationships. TAPON computes these features by means of a two-phase approach. In the first phase, we compute simple features and obtain a preliminary set of labels (hints). In the second phase, we inject our novel features and obtain a refined set of labels. Our experimental results show that our technique, thanks to our rich feature catalogue and novel modelling, achieves higher accuracy than other state-of-the-art techniques.


Acerca de Ayala, Daniel

Palabras clave

Information Integration, Machine Learning, Semantic Labelling
Página completa del ítem
Notificar un error en este resumen
Mostrar cita
Mostrar cita en BibTeX
Descargar cita en BibTeX