Topic Modeling for Feature Location in Software Models: Studying both Code Generation and Interpreted Models





Publicado en

Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022)

Licencia Creative Commons


Context: In the last 20 years, the research community has increased its atten-tion to the use of topic modeling for software maintenance and evolution tasks in code. Topic modeling is a popular and promising information re-trieval technique that represents topics by word probabilities. Latent Di-richlet Allocation (LDA) is one of the most popular topic modeling methods. However, the use of topic modeling in model-driven software development has been largely neglected. Since software models have less noise (imple-mentation details) than software code, software models might be well-suited for topic modeling. Objective: This paper presents our LDA-guided evolutionary approach for feature location in software models. Specifically, we consider two types of software models: models for code generation and interpreted model. Method: We evaluate our approach considering two real-world industrial case studies: code-generation models for train control software, and interpreted models for a commercial video game. To study the impact on the results, we compare our approach for feature location in models against random search and a baseline based on Latent Semantic Indexing, which is a popular infor-mation retrieval technique. In addition, we perform a statistical analysis of the results to show that this impact is significant. We also discuss the results in terms of the following aspects: data sparsity, implementation complexity, calibration, and stability. Results: Our approach significantly outperforms the baseline in terms of re-call, precision, and F-measure when it comes to interpreted models. This is not the case for code-generation models. Conclusions: Our analysis of the results uncovers a recommendation towards results improvement. We also show that calibration approaches can be trans-ferred from code to models. The findings of our work with regards to the compensation of instability have the potential to help not only feature loca-tion in models, but also in code.


Acerca de Pérez, Francisca

Palabras clave

Feature Location, Search-Based Software Engineering, Software Models, Topic Modeling
Página completa del ítem
Notificar un error en este resumen
Mostrar cita
Mostrar cita en BibTeX
Descargar cita en BibTeX