El autor Carlos Cetina ha publicado 8 artículo(s):
Context:Leveraging machine learning techniques to address feature location on models has been gaining attention. Machine learning techniques empower software product companies to take advantage of the knowledge and the experience to improve the performance of the feature location process. Most of the machine learning-based works for feature location on models report the machine learning techniques and the tuning parameters in detail. However, these works focus on the size and the distribution of the data sets, neglecting the properties of their contents.Objective:In this paper, we analyze the influence of three model fragment properties (density, multiplicity, and dispersion) on a machine learning-based approach for feature location.Method:The analysis of these properties is based on an industrial case provided by CAF, a worldwide provider of railway solutions. The test cases were evaluated through a machine learning technique that uses different subsets of a knowledge base to learn how to locate unknown features.Results:Results show that the density and dispersion properties have a direct impact on the results. In our case study, the model fragments with extra-small density values achieve results with up to 43+ACU more precision, 41+ACU more recall, 42+ACU more F-measure, and 0.53 more Matthews Correlation Coefficient (MCC) than the model fragments with other density values. On the other hand, the model fragments with extra-small and small dispersion values achieve results with up to 53+ACU more precision, 52+ACU more recall, 52+ACU more F-measure, and 0.57 more MCC than the model fragments with other dispersion values.Conclusions:The analysis of the results shows that both density and dispersion properties significantly influence the results. These results can serve not only to improve the reports by means of the model fragment properties, but also to be able to compare machine learning-based feature location approaches fairly improving the feature location results.
Autores: Manuel Ballarín / Ana Cristina Marcén / Vicente Pelechano / Carlos Cetina /
Palabras Clave: Feature location - Learning to Rank - Machine Learning - Model Fragment Location
Context: In the last 20 years, the research community has increased its atten-tion to the use of topic modeling for software maintenance and evolution tasks in code. Topic modeling is a popular and promising information re-trieval technique that represents topics by word probabilities. Latent Di-richlet Allocation (LDA) is one of the most popular topic modeling methods. However, the use of topic modeling in model-driven software development has been largely neglected. Since software models have less noise (imple-mentation details) than software code, software models might be well-suited for topic modeling.Objective: This paper presents our LDA-guided evolutionary approach for feature location in software models. Specifically, we consider two types of software models: models for code generation and interpreted model.Method: We evaluate our approach considering two real-world industrial case studies: code-generation models for train control software, and interpreted models for a commercial video game. To study the impact on the results, we compare our approach for feature location in models against random search and a baseline based on Latent Semantic Indexing, which is a popular infor-mation retrieval technique. In addition, we perform a statistical analysis of the results to show that this impact is significant. We also discuss the results in terms of the following aspects: data sparsity, implementation complexity, calibration, and stability.Results: Our approach significantly outperforms the baseline in terms of re-call, precision, and F-measure when it comes to interpreted models. This is not the case for code-generation models.Conclusions: Our analysis of the results uncovers a recommendation towards results improvement. We also show that calibration approaches can be trans-ferred from code to models. The findings of our work with regards to the compensation of instability have the potential to help not only feature loca-tion in models, but also in code.
Autores: Francisca Pérez / Raúl Lapeña / Ana Cristina Marcén / Carlos Cetina /
Palabras Clave: Feature location - Search-Based Software Engineering - Software Models - Topic Modeling
The benefits of Software Product Lines (SPL) are very appealing: software development becomes better, faster, and cheaper. Unfortunately, these benefits come at the expense of a migration from a family of products to a SPL. Feature Location could be useful in achieving the transition to SPLs. This work presents our FeLLaCaM approach for Feature Location. Our approach calculates similarity to a description of the feature to locate, occurrences where the candidate features remain unchanged, and changes performed to the candidate features throughout the retrospective of the product family. We evaluated our approach in two long-living industrial domains: a model-based family of firmwares for induction hobs that was developed over more than 15 years, and a model-based family of PLC software to control trains that was developed over more than 25 years. In our evaluation, we compare our FeLLaCaM approach with two other approaches for Feature Location: (1) FLL (Feature Location through Latent Semantic Analysis) and (2) FLC (Feature Location through Comparisons). We measure the performance of FeLLaCaM, FLL, and FLC in terms of recall, precision, Matthews Correlation Coefficient, and Area Under the Receiver Operating Characteristics curve. The results show that FeLLaCaM outperforms FLL and FLC.
Autores: Carlos Cetina / Jaime Font / Lorena Arcega / Francisca Pérez /
Palabras Clave: Architecture sustainability - Feature location - Long-Living software systems - Model-Driven Engineering - software product lines
Context: Traceability Links Recovery (TLR), Bug Localization (BL), and Feature Location (FL) are amongst the most relevant tasks performed during software maintenance. However, most research in the field targets code, while models have not received enough attention yet.Objective: This paper presents our approach (FROM, Fragment Retrieval on Models) that uses an Evolutionary Algorithm to retrieve the most relevant model fragments for three different types of input queries: natural language requirements for TLR, bug descriptions for BL, and feature descriptions for FL.Method: FROM uses an Evolutionary Algorithm that generates model fragments through genetic operations, and assesses the relevance of each model fragment with regard to the provided query through a fitness configuration. We analyze the influence that four fitness configurations have over the results of FROM, combining three objectives: Similitude, Understandability, and Timing. To analyze this, we use a real-world case study from our industrial partner, which is a worldwide leader in train manufacturing. We record the results in terms of recall, precision, and F-measure. Moreover, results are compared against those obtained by a baseline, and a statistical analysis is performed to provide evidences of the significance of the results.Results: The results show that FROM can be applied in our industrial case study. Also, the results show that the configurations and the baseline have significant differences in performance for TLR, BL, and FL tasks. Moreover, our results show that there is no single configuration that is powerful enough to obtain the best results in all tasks.Conclusions: The type of task performed (TLR, BL, and FL) during the retrieval of model fragments has an actual impact on the results of the configurations of the Evolutionary Algorithm. Our findings suggest which configuration offers better results as well as the objectives that do not contribute to improve the results.
Autores: Francisca Pérez / Raúl Lapeña / Jaime Font / Carlos Cetina /
Palabras Clave: Bug localization - Conceptual models - Evolutionary algorithms - Feature location - Traceability links recovery
Context: Commercial video games usually feature an extensive source code and requirements that are related to code lines from multiple methods. Traceability is vital in terms of maintenance and content update, so it is necessary to explore such search spaces properly.Objective: This work presents and evaluates CODFREL (Code Fragmentbased Requirement Location), our approach to fine-grained requirement traceability, which lies in an evolutionary algorithm and includes encoding and genetic operators to manipulate code fragments that are built from source code lines. We compare it with a baseline approach (Regular-LSI) by configuring both approaches with different granularities (code lines / complete methods).Method: We evaluated our approach and Regular-LSI in the Kromaia video game case study, which is a commercial video game released on PC and PlayStation 4. The approaches are configured with method and code line granularity and work on 20 requirements that are provided by the development company. Our approach and Regular-LSI calculate similarities between requirements and code fragments or methods to propose possible solutions and, in the case of CODFREL, to guide the evolutionary algorithm.Results: The results, which compare code line and method granularity configurations of CODFREL with different granularity configurations of RegularLSI, show that our approach outperforms Regular-LSI in precision and recall, with values that are 26 and 8 times better, respectively, even though it does not achieve the optimal solutions. We make an open-source implementation of CODFREL available.Conclusions: Since our approach takes into consideration key issues like the source code size in commercial video games and the requirement dispersion, it provides better starting points than Regular-LSI in the search for solution candidates for the requirements. However, the results and the influence of domain-specific language on them show that more explicit knowledge is required to improve such results.
Autores: Daniel Blasco / Carlos Cetina / Óscar Pastor /
Palabras Clave: Evolutionary computation - Requirement - Source code - Traceability - Video game
In the context of Model-Driven Engineering applied to video games, software models are high-level abstractions that represent source code implementations of varied content such as the stages of the game, vehicles, or enemy entities (e.g., final bosses). In this work, we present our Evolutionary Model Generation (EMoGen) approach to generate software models that are comparable in quality to the models created by human developers. Our approach is based on an evolution (mutation and crossover) and assessment cycle to generate the software models. We evaluated the software models generated by EMoGen in the Kromaia video game, which is a commercial video game released on Steam and PlayStation 4. Each model generated byEMoGen has more than 1000 model elements. The results, which compare the software models generated by our approach and those generated by the developers, show that our approach achieves results that are comparable to the ones created manually by the developers in the retail and digital versions of the video game case study. However, our approach only takes five hours of unattended time in comparison to ten months of work by the developers. We perform a statistical analysis, and we make an implementation of EMoGen readily available.
Autores: Daniel Blasco / Jaime Font / Mar Zamorano / Carlos Cetina /
Palabras Clave: Game Software Engineering - Model-Driven Engineering - Search-Based Software Engineering
Traceability Link Recovery (TLR) has been a topic of interest for many years within the software engineering community. In recent years, TLR has been attracting more attention, becoming the subject of both fundamental and applied research. However, there still exists a large gap between the actual needs of industry on one hand and the solutions published through academic research on the other.In this work, we propose a novel approach, named Evolutionary Learning to Rank for Traceability Link Recovery (TLR-ELtoR). TLR-ELtoR recovers traceability links between a requirement and a model through the combination of evolutionary computation and machine learning techniques, generating as a result a ranking of model fragments that can realize the requirement.TLR-ELtoR was evaluated in a real-world case study in the railway domain, comparing its outcomes with five TLR approaches (Information Retrieval, Linguistic Rule-based, Feedforward Neural Network, Recurrent Neural Network, and Learning to Rank). The results show that TLR-ELtoR achieved the best results for most performance indicators, providing a mean precision value of 59.91+ACU, a recall value of 78.95+ACU, a combined F-measure of 62.50+ACU, and a MCC value of 0.64. The statistical analysis of the results assesses the magnitude of the improvement, and the discussion presents why TLR-ELtoR achieves better results than the baselines.
Autores: Ana Cristina Marcén / Raúl Lapeña / Oscar Pastor / Carlos Cetina /
Palabras Clave: Evolutionary algorithm - Learning to Rank - models - requirements engineering - Traceability Link Recovery
Context: Maintenance activities cannot be completed without locating the set of software artifacts that realize a particular feature of a software system. Manual Feature Location (FL) is widely used in industry, but it becomes challenging (time-consuming and error prone) in large software repositories. To reduce manual efforts, automated FL techniques have been proposed. Research efforts in FL tend to make comparisons between automated FL techniques, ignoring manual FL techniques. Moreover, existing research puts the focus on code, neglecting other artifacts such as models.Objective: This paper aims to compare manual FL against automated FL in models to answer important questions about performance, productivity, and satisfaction of both treatments.Method: We run an experiment for comparing manual and automated FL on a set of 18 subjects (5 experts and 13 non-experts) in the domain of our industrial partner, BSH, manufacturer of induction hobs for more than 15 years. We measure performance (recall, precision, and F-measure), productivity (ratio between F-measure and spent time), and satisfaction (perceived ease of use, perceived usefulness, and intention to use) of both treatments, and perform statistical tests to assess whether the obtained differences are significant.Results: Regarding performance, manual FL significantly outperforms automated FL in precision and F-measure (up to 27.79+ACU and 19.05+ACU, respectively), whereas automated FL significantly outperforms manual FL in recall (up to 32.18+ACU). Regarding productivity, manual FL obtains 3.43+ACU-/min, which improves automated FL significantly. Finally, there are no significant differences in satisfaction for both treatments.Conclusions: The findings of our work can be leveraged to advance research to improve the results of manual and automated FL techniques. For instance, automated FL in industry faces issues such as low discrimination capacity. In addition, the obtained satisfaction results have implications for the usage and possible combination of manual, automated, and guided FL techniques.
Autores: Francisca Pérez / Jorge Echeverría / Raúl Lapeña / Carlos Cetina /
Palabras Clave: Conceptual models - Controlled Experiment - Feature location - Model-Driven Engineering