Ingeniería y Ciencia de Datos
URI permanente para esta colección:
Artículos en la categoría Ingeniería y Ciencia de Datos publicados en las Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024).
Notificar un error en esta colección
Examinar
Envíos recientes
Artículo Sistema predictivo de la productividad en el cultivo de caña de azúcar a corto plazoMorales Lemus, Joel Estuardo; Mera, David; Quemé de León, José Luís; Taboada, José Ángel. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Un sistema predictivo de toneladas de caña por hectarea es clave para poder analizar la respuesta del cultivo a las diferentes condiciones metereológicas y a las actividades agrícolas que se realicen. En este trabajo se presenta un sistema basado en aprendizaje automático centrado en el análisis del índice vegetativo NDVI que permite realizar una predicción desde el quinto al duodécimo mes de cultivo, obteniendo en el último mes un coeficiente de determinación del 82% con un 3,2% de error en la predicción y reduciendo en un 54% el error comparado con la estimación convencional realizada por el administrador agrícola.Resumen An Approach for Proactive Mobile Recommendations Based on User-Defined RulesIlarri, Sergio; Trillo-Lado, Raquel. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.In the Big Data era, context-aware mobile recommender systems are crucial in assisting citizens and tourists in making informed decisions, providing a suitable way for users to find the relevant data. These systems should be proactive, able to detect the ideal time and location to provide recommendations for a specific item or activity. To accomplish this, push-based recommender systems can be employed, utilizing context rules to determine when a recommendation should be initiated. However, there is very limited reported experience in defining and implementing such systems and a complete generic solution that adapts flexibly to the preferences of users and protects their privacy is still missing. In this paper, we present a novel approach where appropriate types of recommendations are provided automatically, without the need for user input. Our proposal allows users to easily activate, deactivate, customize, and create rules for improved personalization. Additionally, the module that, based on the context, decides the types of recommendations required is executed on the user's mobile device, reducing wireless communication and safeguarding the user's privacy, as context data are evaluated locally. To illustrate the approach, we have developed R-Rules, a prototype for Android devices focused on the triggering of recommendation rules, which provides a friendly user interface that facilitates user personalization. We have evaluated various technological options and demonstrated the feasibility, performance, and scalability of the proposal, as well as its suitability to users' needs.Artículo Predicción de Tráfico con Redes Neuronales ArtificialesAl-Rahamneh, Anas; Budiño, Alejandro; Villarroya, Sebastián; Cotos, José Manuel. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Las ciudades están creciendo constantemente en número de vehículos. Esto deriva en una demanda creciente de herramientas eficientes para gestionar el tráfico. Los modelos de predicción de tráfico son elementos esenciales para dichas herramientas. En este trabajo pretendemos estudiar el rendimiento de diferentes arquitecturas \textit{deep learning} al abordar el problema de la predicción del tráfico en una ubicación específica en la ciudad de Santiago de Compostela. De momento, se han probado modelos de tipo Perceptrón, Recurrente y Convolucional. La comparación de estas arquitecturas se ha llevado a cabo a través del diseño y ejecución de diferentes experimentos utilizando datos de sensores de tráfico. También se introduce una nueva métrica para la evaluación del rendimiento de los modelos.Artículo Towards a Process Reference Model for Citizen Science Projects Formulation: a First ApproachGuerra-García, César; Caballero, Ismael. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Citizen science, also referred to as public participation in scientific research and knowledge production, is gaining increasing recognition as a well- developed and valued approach. This approach has a global reach and is em- ployed across a diverse spectrum of scientific domains and disciplines. Presently, literature highlights several concerns related to citizen science data and pro- cesses, leading to insufficient quality levels and, consequently, unacceptable sit- uations within citizen science projects. These issues have a direct impact on the sustainability of such projects. To alleviate these undesirable effects, we posse that the standardization of some best practices around citizen science projects can lead to a better performance of the processes and their results. One of the most relevant concerns in the process is the quality of the data used at the various stages of the data life cycle from its generation by participants (researchers and citizens) to the usage and exploitation of the data. The main contribution of this work is twofold: on a hand to identify which are the best practices related to citizen science projects, and on the other hand to investigate how these best prac- tices can be enriched with some other related to data quality management and data governance. As a result, we produced CI.SCI.FORM, a framework that can be used to support institutions to better propose and execute their citizen science projects. This framework consists of two main components: a Process Reference Model (PRM) and a Process Assessment Model (PAM). In this paper we are go- ing to first introduce the CI.SCI.FORM PRM, which gather 16 process grouped in 4 blocks.Artículo Procesamiento aproximado para la búsqueda interactiva en conjuntos de datos molecularesLuaces Cachaza, David; Viqueira, José R.R.; Martinez Casas, David; Rey, Andrea; Vázquez Álvarez, Álvaro. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Muchas son las situaciones en las que usuarios de bases de datos de compuestos químicos pueden necesitar efectuar búsquedas de moléculas cuya estructura molecular pueda contener a una determinada subestructura. Un ejemplo es la eliminación de réplicas de compuestos conocidos. Una estructura molecular se representa mediante un grafo, cuyos nodos son átomos de elementos químicos (carbón, nitrógeno, hidrógeno, etc.), y cuyos arcos son enlaces químicos entre átomos. La búsqueda de subestructuras, se transforma así en un problema bien conocido en la computación, es decir, en la búsqueda de subgrafos. En una interfaz interactiva de filtrado, la base de datos se filtra automáticamente e interactivamente según el usuario va especificando su consulta. Un ejemplo bien conocido es el filtrado basado en palabras clave utilizado en búsquedas de texto completo. En este caso, según el usuario va tecleando las letras de su consulta el resultado de la búsqueda se va actualizando interactivamente. En este tipo de interfaces, la precisión de la respuesta es secundaria respecto al tiempo de respuesta del sistema, que debe de ser interactivo (por debajo del segundo). Para la búsqueda interactiva de estructuras moleculares, se pueden construir interfaces similares, en las que el usuario va construyendo gráficamente su estructura mientras el resultado de la consulta se va actualizando de forma interactiva. Las soluciones actuales de búsqueda de subgrafos combinan el uso de un índice con un algoritmo de isomorfismo de grafos, que elimina falsos positivos obtenidos del índice. Estas soluciones proporcionan soluciones totalmente precisas en tiempo interactivos cuando las bases de datos son razonablemente pequeñas (alrededor del millón de grafos). Sin embargo, cuando las bases de datos superan las decenas de millones de grafos, los tiempo de respuesta se disparan, incluso si se utilizan implementaciones avanzadas sobre arquitecturas paralelas. Estos tiempos de respuesta altos se dan al tener que aplicar el algoritmo de isomorfismo de grafos (que es muy lento) sobre resultados grandes obtenidos del índice, bien porque la consulta es poco selectiva o porque la base de datos es muy grande. En este trabajo se probarán distintas estructuras de representación aproximada de conjuntos de datos para implementar la búsqueda aproximada de subestructuras que ofrezcan siempre tiempos de respuesta interactivos (por debajo del segundo). El objetivo general será encontrar aquellas configuraciones de estructuras y parámetros que ofrezcan un mejor compromiso entre el tiempo de respuesta y la eficacia de la búsqueda (precisión y completitud), siempre primando la respuesta interactiva.Resumen AYNEXT - tools for streamlining the evaluation of link prediction techniquesSola, Fernando; Ayala, Daniel; Ayala, Rafael; Hernandez, Inma; Rivero, Carlos R.; Ruiz, David. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.AYNEXT is an open source Python suite aimed towards researchers in the field of link prediction in Knowledge Graphs. Link prediction consists of predicting missing edges in a Knowledge Graph, which usually involves the application of different techniques to generate negative examples (false triples) to fit a model, and splitting edges into training, testing and validation sets. Setting up a correct evaluation setup or testing new negatives-generation strategies becomes more challenging as more complex strategies and considerations (e.g., removal of inverse relations) develop. AYNEXT makes it easy to configure and customize the creation of evaluation datasets and the computation of evaluation metrics and statistical significance tests for each pair of link prediction techniques. AYNEXT has been designed to be simple to use, but modular enough to enable customization of the main steps in the evaluation process. AYNEXT-DataGen covers the pre-processing, splitting, and negatives generation steps of the evaluation process, while AYNEXT-ResTest covers the metrics computing and the statistical tests. AYNEXT offers a simple to use command line interface that takes as input either a Knowledge Graph in standard formats or the results of applying existing techniques, but can be used programmatically for in-depth customization.Resumen Deep embeddings and Graph Neural Networks: using context to improve domain-independent predictionsSola, Fernando; Ayala, Daniel; Hernandez, Inma; Ruiz, David. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Graph neural networks (GNNs) are deep learning architectures that apply graph convolutions through message-passing processes between nodes, represented as embeddings. GNNs have recently become popular because of their ability to obtain a contextual representation of each node taking into account information from its surroundings. However, existing work has focused on the development of GNN architectures, using basic domain-specific information about the nodes to compute embeddings. Meanwhile, in the closely-related area of knowledge graphs, much effort has been put towards developing deep learning techniques to obtain node embeddings that preserve information about relationships and structure without relying on domain-specific data. The potential application of deep embeddings of knowledge graphs in GNNs remains largely unexplored. In this paper, we carry out a number of experiments to answer open research questions about the impact on GNNs performance when combined with deep embeddings. We test 7 different deep embeddings across several attribute prediction tasks in two state-of-art attribute-rich datasets. We conclude that, while there is a significant performance improvement, its magnitude varies heavily depending on the specific task and deep embedding technique considered.Resumen An ontology-based secure design framework for graph-based databasesPaneque, Manuel; Roldan-Garcia, Maria Del Mar; Blanco, Carlos; Maté, Alejandro; G.Rosado, David; Trujillo, Juan. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Graph-based databases are concerned with performance and flexibility. Most of the existing approaches used to design secure NoSQL databases are limited to the final implementation stage, and do not involve the design of security and access control issues at higher abstraction levels. Ensuring security and access control for Graph-based databases is difficult, as each approach differs significantly depending on the technology employed. In this paper, we propose the first technology-ascetic framework with which to design secure Graph-based databases. Our proposal raises the abstraction level by using ontologies to simultaneously model database and security requirements together. This is supported by the TITAN framework, which facilitates the way in which both aspects are dealt with. The great advantages of our approach are, therefore, that it: allows database designers to focus on the simultaneous protection of security and data while ignoring the implementation details; facilitates the secure design and rapid migration of security rules by deriving specific security measures for each underlying technology, and enables database designers to employ ontology reasoning in order to verify whether the security rules are consistent. We show the applicability of our proposal by applying it to a case study based on a hospital data access control.Resumen MOODY: An ontology-driven framework for standardizing multi-objective evolutionary algorithmsAldana Martín, José Francisco; Roldan-Garcia, Maria Del Mar; Nebro Urbaneja, Antonio Jesus; Aldana Montes, José Francisco. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.The application of semantic technologies, particularly ontologies, in the realm of multi-objective evolutionary algorithms is overlook despite their effectiveness in knowledge representation. In this paper, we introduce MOODY, an ontology specifically tailored to formalize these kinds of algorithms, encompassing their respective parameters, and multi-objective optimization problems based on a characterization of their search space landscapes. MOODY is designed to be particularly applicable in automatic algorithm configuration, which involves the search of the parameters of an optimization algorithm to optimize its performance. In this context, we observe a notable absence of standardized components, parameters, and related considerations, such as problem characteristics and algorithm configurations. This lack of standardization introduces difficulties in the selection of valid component combinations and in the re-use of algorithmic configurations between different algorithm implementations. MOODY offers a means to infuse semantic annotations into the configurations found by automatic tools, enabling efficient querying of the results and seamless integration across diverse sources through their incorporation into a knowledge graph. We validate our proposal by presenting four case studies.Resumen NORA: Scalable OWL reasoner based on NoSQL databases and Apache SparkBenítez Hidalgo, Antonio; Navas Delgado, Ismael; Roldan-Garcia, Maria Del Mar. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Reasoning is the process of inferring new knowledge and identifying inconsistencies within ontologies. Traditional techniques often prove inadequate when reasoning over large Knowledge Bases containing millions or billions of facts. This paper introduces NORA, a persistent and scalable OWL reasoner built on top of Apache Spark, designed to address the challenges of reasoning over extensive and complex ontologies. NORA exploits the scalability of NoSQL databases to effectively apply inference rules to Big Data ontologies with large ABoxes. To facilitate scalable reasoning, OWL data, including class and property hierarchies and instances, are materialized in the Apache Cassandra database. Spark programs are then evaluated iteratively, uncovering new implicit knowledge from the dataset and leading to enhanced performance and more efficient reasoning over large-scale ontologies. NORA has undergone a thorough evaluation with different benchmarking ontologies of varying sizes to assess the scalability of the developed solution.Artículo Onto-CARMEN: Ontology-driven approach for Cyber-Physical System Security Requirements meta-modelling and reasoningBlanco, Carlos; G.Rosado, David; Varela Vaca, Angel Jesus; Gómez López, Maria Teresa; Fernandez-Medina, Eduardo. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.In the last years, Cyber-physical systems (CPS) have at- tracted substantial mainstream, especially in the industrial sector, since they have become the focus of cyber-attacks. CPS are complex systems that encompass a great variety of hardware and software components with a countless number of configurations and features. For this reason, the construction, validation, and diagnosis of security in CPS become a major challenge. An invalid security requirement for the CPS can pro- duce partial or incomplete configuration, even misconfigurations, and hence catastrophic consequences. Therefore, it is crucial to ensure the val- idation of the security requirements specification from the earlier design stages. To this end, OntoCarmen is proposed, a semantic approach that enables the automatic verification and diagnosis of security requirements according to the ENISA and OWASP recommendations. Our approach provides a mechanism for the specification of security requirements on top of ontologies, and automatic diagnosis through semantic axioms and SPARQL rules. The approach has been validated using security require- ments from a real case study.Artículo Proceso para la evaluación de la equidad en sistemas de Inteligencia ArtificialNavarro, Álvaro; Lavalle, Ana; Maté, Alejandro; Trujillo, Juan. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Pese a la relevancia de los sistemas de Inteligencia Artificial (IA) y los modelos de aprendizaje automático (ML) en la sociedad, dichos sistemas y modelos a menudo están limitados por la opacidad en su toma de decisiones. Este punto es clave, pues la falta de interpretabilidad puede suponer que se tomen decisiones injustas de manera oculta, lo que impediría tomar acciones correctivas para solucionar el problema. Aunque diferentes trabajos han afrontado el desafío de la opacidad desde un punto de vista explicable, hay una carencia de propuestas que traten de explote la información de los datos para este fin. En este contexto, y para ayudar al experto en ML en el análisis y la toma de decisiones, presentamos un proceso basado en un algoritmo jerárquico, denominado Árbol de Detección de Equidad (ADE), el cual recorre recursivamente los datos para crear un árbol de análisis (AAE). Combinado con técnicas de explicabilidad, unimos las características de los grupos a las decisiones del modelo, proporcionando a través del AAE información que puede mejorar la confianza en los resultados y propiciar una mejor comprensión del proceso de toma de decisiones del modelo. Las principales contribuciones de este trabajo son: (i) definimos métricas de equidad que tienen en cuenta conjuntos de datos reducidos y/o desbalanceados; (ii) analizamos automáticamente el conjunto de datos explotando los pesos extraídos del modelo; e (iii) identificamos grupos cuyo trato ha sido potencialmente injusto. Para demostrar la aplicabilidad de nuestra propuesta, analizamos su efectividad en cuatro dominios distintos.Artículo ELLIOT: una herramienta para el procesamiento y análisis de datos de consumo energético de softwareGordillo, Alberto; García, Felix; Calero, Coral. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.En este trabajo se presenta ELLIOT, una herramienta capaz de analizar los datos del consumo de energía de la ejecución de un determinado software y de generar gráficos e informes que faciliten su comprensión. ELLIOT forma parte de FEETINGS (Framework for Energy Efficiency Testing to Improve eNvironmental Goals of the Software), proporcionando como principales ventajas: (i) gestionar de forma sistemática y organizada los proyectos de medición de consumo; (ii) facilitar el análisis de los resultados de consumo mejorando la productividad y fiabilidad de dicha tarea.Resumen Scalable approach for high-resolution land cover: a case study in the Mediterranean BasinBurgueño Romero, Antonio Manuel; Aldana Martín, José Francisco; Vázquez Pendón, María; Barba-González, Cristóbal; Jiménez Gómez, Yaiza; García Millán, Virginia; Navas Delgado, Ismael. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.The production of land cover maps is an everyday use of image classification applications on remote sensing. However, managing Earth observation satellite data for a large region of interest is challenging in the task of creating land cover maps. Since satellite imagery is getting more precise and extensive, Big Data techniques are becoming essential to handle the rising quantity of data. Furthermore, given the complexity of managing and analysing the data, defining a methodology that reduces the complexity of the process into different smaller steps is vital to data processing. This paper presents a Big Data methodology for creating land cover maps employing artificial intelligence algorithms. Machine Learning algorithms are contemplated for remote sensing and geodata classification, supported by explainable artificial intelligence. Furthermore, the process considers aspects related to downloading data from different satellites, Copernicus and ASTER, executing the pre-processing and processing of the data in a distributed environment, and depicting the visualisation of the result. The methodology is validated in a test case for creating a land cover map of the Mediterranean Basin.Resumen e-Science workflow: A semantic approach for airborne pollen predictionHurtado, Sandro; Antequera, María Luisa; Barba-González, Cristóbal; Picornell, Antonio; Navas Delgado, Ismael. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Allergic rhinitis has become a global health problem in recent decades because airborne pollen is a primary trigger of this respiratory disorder. Moreover, pollinosis can exacerbate the symptoms of asthma and favour respiratory infections. Seasonal pollen trends and climatic circumstances (such as temperature, precipitation, relative humidity, wind speed and direction, and other variables) can impact daily airborne pollen concentrations, influencing local pollen emission and dispersion. Because of that, pollen monitoring and prediction are becoming more relevant to the urban population and scientific interest is put into them. Due to such tasks' high volume of data, scientists are starting to use computational tools like workflows to automate and speed up the process. Furthermore, using the expert scientific domain is critical for improving the analysis, allowing, among others, a better workflow configuration and data provenance. As semantic web technologies have been revealed as an essential means for knowledge representation, we implemented this workflow information as an ontology using formats like RDF(S) and OWL. Consequently, this paper provides a semantic-enhanced e-Science workflow based on the TITAN framework for pollen forecasting analysis using meteorological data. Furthermore, a catalogue of components is developed on the TITAN framework, which allows the creation of different workflow versions. Two case studies of pollen prediction were developed to test the implementation of the aforementioned methodologies. Both were elaborated with airborne pollen data obtained in the city of Málaga (Spain). Still, one was elaborated for Platanus pollen type (narrow annual main pollination period), while the other was done for Amaranthaceae pollen type (extensive annual main pollination period). The predictions have been conducted using machine and deep learning algorithms like SARIMA or CNN-LSTM that intend to optimise the pollen prediction procedure depending on its stational and seasonal profile.Artículo Generación de Imágenes de Fachadas con un Dataset Reducido mediante GANs, Transfer Learning y Data AugmentationGarcía Carrasco, Jorge; Teruel, Miguel A.; Maté, Alejandro; Trujillo, Juan. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.El Deep Learning (DL) se ha aplicado con éxito en varias campos, como la visión por computador o el procesamiento del lenguaje natural. En concreto, se han utilizado Generative Adversarial Networks (GAN) para obtener resultados impresionantes en la tarea de generación de imágenes. Sin embargo, la cantidad de datos necesarios para entrenar con éxito una GAN puede limitar su aplicación a campos donde la adquisición de datos puede ser extremadamente costosa e incluso inviable, como en los campos de la arquitectura o medicina, por ejemplo. En este trabajo, estudiamos dos técnicas que nos ayudan a mitigar esta limitación, en concreto, las técnicas definidas como Adaptative Distriminator Augmentation (ADA) y el Transfer Learning (TL), entrenando un modelo StyleGAN2 para generar imágenes de fachadas de vidrio con un conjunto de datos de entrenamiento extremadamente pequeño. Se realizarán varios entrenamientos variando ciertos parámetros relacionados con dichas técnicas. Tras analizar los resultados, concluimos que tanto ADA como TL permiten que la red converja más rápido, sea más estable y obtenga resultados de mejor calidad en general, permitiendo así aplicar las GAN a campos donde la adquisición de datos es limitada, como por ejemplo en el campo de la arquitectura, donde podría ser utilizado por arquitectos como una herramienta útil para ayudar en el proceso de diseño de una fachada.Resumen KNIT: Ontology Reusability through Knowledge Graph ExplorationRodriguez-Revello, Jorge; Barba-González, Cristóbal; Rybinski, Maciej; Navas Delgado, Ismael. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Ontologies have become a standard for knowledge representation across several domains. In Life Sciences, numerous ontologies have been introduced to represent human knowledge, often providing overlapping or conflicting perspectives. These ontologies are usually published as OWL or OBO, and are often registered in open repositories, e.g., BioPortal. However, the task of finding the concepts (classes and their properties) defined in the existing ontologies and the relationships between these concepts across different ontologies—for example, for developing a new ontology aligned with the existing ones—requires a great deal of manual effort in searching through the public repositories for candidate ontologies and their entities. In this work, we develop a new tool, KNIT, to automatically explore open repositories to help users fetch the previously designed concepts using keywords. User-specified keywords are then used to retrieve matching names of classes or properties. KNIT then creates a draft knowledge graph populated with the concepts and relationships retrieved from the existing ontologies. Furthermore, following the process of ontology learning, our tool refines this first draft of an ontology. We present three BioPortal-specific use cases for our tool. These use cases outline the development of new knowledge graphs and ontologies in the sub-domains of biology: genes and diseases, virome and drugs.Resumen A Data Analytics Methodology to Visually Analyze the Impact of Bias and RebalancingLavalle, Ana; Maté, Alejandro; Trujillo, Juan; Teruel, Miguel A.. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Data Analytics have become a key component of many business processes which influence several aspects of our daily life. Indeed, any misinterpretation or flaw in the outputs of Data Analytics results can cause significant damage, specialy when dealing with one of the most often overlooked issues, namely the unaware use of biased data. When data bias goes unadverted, it warps the meaning of data, having a devastating effect on Data Analytics results. Although it is widely argued that the most common manner to deal with data bias is to rebalance biased datasets, it is not an aseptic transformation, leading to several potentially undesired side-effects that will probably harm the result of data analyses. Therefore, in order to analyze the underlying bias in datasets, in this work we present (i) a comprehensive methodology based on visualization techniques, which assists users in the definition of their analytical requirements to detect and visually represent the data bias automatically helping them to find out whether it is appropriate to artificially rebalance their dataset or not; (ii) a novel metamodel for visually representing bias; (iii) a motivating real-world running example used to analyze the impact of bias in Data Analytics and (iv) an assessment of the improvements introduced by our proposal through a complete real-world case study by using a Fire Department Calls for Service dataset, thus demonstrating that rebalancing datasets is not always the best option. It is crucial to study the context where the decisions are going to be taken. Moreover, it is also important to do a pre-analysis with the aim of knowing the nature of the datasets and how biased they are.Resumen Feature Engineering of EEG applied to Mental Disorders: a Systematic Mapping StudyGarcía Ponsoda, Sandra; García Carrasco, Jorge; Teruel, Miguel A.; Maté, Alejandro; Trujillo, Juan. Actas de las XXVIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2024), 2024-06-17.Around a third of the total population of Europe suffers from mental disor-ders. The use of electroencephalography (EEG) together with Machine Learning (ML) algorithms to diagnose mental disorders has recently been shown to be a prominent research area, as exposed by several reviews fo-cused on the field. Nevertheless, previous to the application of ML algo-rithms, EEG data should be correctly preprocessed and prepared via Feature Engineering (FE). In fact, the choice of FE techniques can make the differ-ence between an unusable ML model and a simple, effective model. In other words, it can be said that FE is crucial, especially when using complex, non-stationary data such as EEG. To this aim, in this paper we present a Systematic Mapping Study (SMS) focused on FE from EEG data used to identify mental disorders. Our SMS covers more than 900 papers, making it one of the most comprehensive to date, to the best of our knowledge. We gathered the mental disorder addressed, all the FE techniques used, and the Artificial Intelligence (AI) algorithm applied for classification from each paper. Our main contributions are: (i) we offer a starting point for new re-searchers on these topics, (ii) we extract the most used FE techniques to classify mental disorders, (iii) we show several graphical distributions of all used techniques, and (iv) we provide critical conclusions for detecting mental disorders. To provide a better overview of existing techniques, the FE process is divided into three parts: (i) signal transformation, (ii) feature extraction, and (iii) feature selection. Moreover, we classify and analyze the distribution of existing papers according to the mental disorder they treat, the FE processes used, and the ML techniques applied. As a result, we pro-vide a valuable reference for the scientific community to identify which techniques have been proven and tested and where the gaps are located in the current state of the art.