Ingeniería y Ciencia de Datos
URI permanente para esta colección:
Artículos en la categoría Ingeniería y Ciencia de Datos publicados en las Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022).
Examinar
Examinando Ingeniería y Ciencia de Datos por Fecha de publicación
Mostrando 1 - 13 de 13
Resultados por página
Opciones de ordenación
Resumen Injecting domain knowledge in multi-objective optimization problems:A semantic approachBarba-González, Cristóbal; Nebro, Antonio J.; García-Nieto, José Manuel; Roldán-García, María del Mar; Navas Delgado, Ismael; Aldana-Montes, José F.. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.In the field of complex problem optimization with metaheuristics, semantics has been used for modeling different aspects, such as: problem characterization, parameters, decision-maker's preferences, or algorithms. However, there is a lack of approaches where ontologies are applied in a direct way into the optimization process, with the aim of enhancing it by allowing the systematic incorporation of additional domain knowledge. This is due to the high level of abstraction of ontologies, which makes them difficult to be mapped into the code implementing the problems and/or the specific operators of metaheuristics. In this paper, we present a strategy to inject domain knowledge (by reusing existing ontologies or creating a new one) into a problem implementation that will be optimized using a metaheuristic. Thus, this approach based on accepted ontologies enables building and exploiting complex computing systems in optimization problems. We describe a methodology to automatically induce user choices (taken from the ontology) into the problem implementations provided by the jMetal optimization framework. With the aim of illustrating our proposal, we focus on the urban domain. Concretely, we start from defining an ontology representing the domain semantics for a city (e.g., building, bridges, point of interest, routes, etc.) that allows defining a-priori preferences by a decision maker in a standard, reusable, and formal (logic-based) way. We validate our proposal with several instances of two use cases, consisting in bi-objective formulations of the Traveling Salesman Problem (TSP) and the Radio Network Design problem (RND), both in the context of an urban scenario. The results of the experiments conducted show how the semantic specification of domain constraints are effectively mapped into feasible solutions of the tackled TSP and RND scenarios. This proposal aims at representing a step forward towards the automatic modeling and adaptation of optimization problems guided by semantics, where the annotation of a human expert can be now considered during the optimization process.Artículo Studying the relationship between complexity and energy efficiency in relational databasesPoy García de Marina, Oscar; Calero Muñoz, Coral. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.Databases are a key part of any Information System (IS), as they are concerned with storing and retrieving the data required by the system. Among the different types of databases, the relational ones are the most widely used. According to previous works, there are two measures that affect the complexity of relational databases, the number of foreign keys (NFK) and the depth of the referential tree (DRT). However, a new kid on the block has appeared in the last years, the energy consumption of software in general and, therefore, also in databases. Bearing this in mind, the aim of this paper is to determine whether the results about the complexity of relational databases are related to the energy consumption of the database. To do that we have measured the energy consumption required by four different relational database schemas, each one of them representing different values for NFK and DRT. Moreover, we have implemented and used the four schemas on four different database management systems. As a result, we can conclude that both measures have a noticeable impact not only on the complexity but also in the consumption of the database when querying.Resumen Efficient access methods for very large distributed graph databasesLuaces Cachaza, David; Ríos Viqueira, José Ramón; Cotos Yáñez, José Manuel; Flores, Julian. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.La búsqueda de subgrafos es un problema muy importante en el ámbito de las bases de datos de grafos. El problema presenta un gran reto en lo que respecta a la eciencia de las soluciones, debido a la presencia del isomorsmo de subgrafos, que es un problema NP-Completo. Los métodos de tipo Filter-Then-Verify (FTV) mejoran la eciencia mediante el uso de índices que evitan el tener que evaluar el isomorsmo de subgrafos sobre todos los grafos almacenados. En aplicaciones reales, como la búsqueda de subestructuras moleculares, la búsqueda de subgrafos ha de aplicarse sobre conjuntos de datos enormes (decenas de millones de elementos). Estudios anteriores han identicado a dos solutions de tipo FTV, GrahpGrepSX (GGSX) y CT-Index, como las de mejor eciencia al aplicarse en bases de datos de miles de elementos, sin embargo, su eciencia en conjuntos de datos realmente grandes no es la apropiada. En este trabajo se propone una aproximación genérica para la implementación de soluciones de tipo FTV en entornos de procesamiento distribuido. Además, se adaptan tres métodos anteriormente propuestos, que mejoran el rendimiento de GGXS y CT-Index, para ser ejecutados en clusters de computación. La evaluación muestra como las soluciones propuestas proporcionan grandes mejoras de rendimiento en el tiempo de ltrado cuando son ejecutadas en arquitecturas centralizadas, y además permiten evaluar de forma eciente la búsqueda de subgrafos en bases de datos de gran volumen aprovechando las capacidades de las arquitecturas distribuidas.Resumen TITAN: A knowledge-based platform for Big Data workflow managementBenítez-Hidalgo, Antonio; Barba-González, Cristóbal; García-Nieto, José Manuel; Gutierez-Moncayo, Pedro; Paneque, Manuel; Nebro, Antonio J.; Roldán-García, María del Mar; Aldana-Montes, José F.. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.Modern applications of Big Data are transcending from being scalable solutions of data processing and analysis, to now provide advanced functionalities with the ability to exploit and understand the underpinning knowledge. This change is promoting the development of tools in the intersection of data processing, data analysis, knowledge extraction and management. In this paper, we propose TITAN, a software platform for managing all the life cycle of science workflows from deployment to execution in the context of Big Data applications. This platform is characterised by a design and operation mode driven by semantics at different levels: data sources, problem domain and workflow components. The proposed platform is developed upon an ontological framework of meta-data consistently managing processes and models and taking advantage of domain knowledge. TITAN comprises a well-grounded stack of Big Data technologies including Apache Kafka for inter-component communication, Apache Avro for data serialisation and Apache Spark for data analytics. A series of use cases are conducted for validation, which comprises workflow composition and semantic metadata management in academic and real-world fields of human activity recognition and land use monitoring from satellite images.Artículo Minería de flujos de datos en entornos heterogéneos y distribuidos: aplicación en la Industria 4.0Dintén, Ricardo; López Martínez, Patricia; Yebenes, Juan; Zorrilla, Marta Elena. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.Uno de los principales objetivos de la Industria 4.0 es lograr la necesaria integración horizontal y vertical del sistema de producción. Para ello es necesario desplegar una plataforma digital que integre y procese la ingente cantidad de datos generados en el entorno. Mucha de esta información procede del IoT, y, en concreto, corresponde a sensores que emiten flujos continuos de datos cuyo análisis mediante técnicas de minería de datos permitiría mejorar los procesos industriales, como por ejemplo construyendo modelos dirigidos al mantenimiento preventivo y predictivo de los sistemas físicos, donde aún hay retos abiertos. El objeto de este artículo es describir el punto de partida de esta investigación que es el resultado de un proyecto del plan nacional y discutir su extensión señalando las líneas de trabajo que se pretenden abordar y los resultados que se persigue conseguir para contribuir al avance de la I4.0.Resumen FIMED: Flexible management of biomedical dataHurtado Requena, Sandro; García-Nieto, José Manuel; Navas Delgado, Ismael; Aldana-Montes, José F.. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.In the last decade, clinical trial management systems have become an essential support tool for data management and analysis in clinical research. However, these clinical tools have design limitations, since they are currently not able to cover the needs of adaptation to the continuous changes in the practice of the trials due to the heterogeneous and dynamic nature of the clinical research data. These systems are usually proprietary solutions provided by vendors for specific tasks. In this work, we propose FIMED, a software solution for the flexible management of clinical data from multiple trials, moving towards personalized medicine, which can contribute positively by improving clinical researchers quality and ease in clinical trials. This tool allows a dynamic and incremental design of patients+IBk profiles in the context of clinical trials, providing a flexible user interface that hides the complexity of using databases. Clinical researchers will be able to define personalized data schemas according to their needs and clinical study specifications. Thus, FIMED allows the incorporation of separate clinical data analysis from multiple trials. The efficiency of the software has been demonstrated by a real-world use case for a clinical assay in Melanoma disease, which has been indeed anonymized to provide a user demonstration. FIMED currently provides three data analysis and visualization components, guaranteeing a clinical exploration for gene expression data: heatmap visualization, clusterheatmap visualization, as well as gene regulatory network inference and visualization. An instance of this tool is freely available on the web at https://khaos.uma.es/fimed. It can be accessed with a demo user account, +IBw-researcher+IB0, using the password +IBw-demo+IB0. Category: COMPUTER SCIENCE, THEORY +ACY METHODS. Ranking: 13/110. Journal: COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE. Year: 2021. DOI: https://doi.org/10.1016/j.cmpb.2021.106496.Resumen A Compact Representation of Indoor TrajectoriesFariña, Antonio; Gutiérrez-Asorey, Pablo; Ladra, Susana; Penabad, Miguel R.; Varela Rodeiro, Tirso. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.We present a system that combines indoor positioning with a compression algorithm for trajectories in the context of a nursing home. Our aim is to gather and effectively represent the location of the residents and caregivers along time, while allowing for efficient access to those data. We briefly show the system architecture that enables the automatic tracking of user's movements and consequently the gathering of their locations. Then, we present indRep, our compact representation to handle positioning data using grammar-based compression, and provide two basic operations that enable pseudo-random access to the data. Finally, we include experiments that show that indRep is competitive with well-know general-purpose compressors in terms of compression effectiveness and also provides fast access to the compressed data. We expect both features would enable exploitation functionalities even in computers with rather low computational resources.Artículo Marco metodológico para la creación, implantación y mantenimiento de Sistemas de Gobierno de DatosCaballero Muñoz-Reja, Ismael; Piattini Velthuis, Mario Gerardo; Gualo, Fernando. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.Para poder optimizar el rendimiento que las organizaciones esperan de sus datos, necesitan tenerlos datos gobernados de acuerdo con sus estrategias or-ganizacionales. Para ello tienen que desarrollar e implantar Sistemas de Go-bierno de Datos. El diseño, implantación y mantenimiento de estos sistemas requiere del despliegue y coordinación de diversos elementos. Pero en la realidad, hemos podido constatar que las organizaciones se encuentran con una serie de problemas que les impiden implantar sistemas de gobierno de datos que respondan a sus necesidades y expectativas. En este artículo se propone un marco metodológico +IBM compuesto por un modelo de sistema de gobierno de datos y un proceso para su diseño, implantación y mantenimiento - alineado al estándar internacional ISO/IEC 38505 y basado en el Modelo Alarcos de Mejora de Datos (MAMDv3.0). El proceso (KCAE) consta de cuatro etapas: Conocimiento (para determinar lo que se sabe sobre los datos y lo que se requiere saber), Control (para determinar si el control ejercido sobre los datos a través de las políticas es suficiente de acuerdo con el obje-tivo requerido), Adaptación (para dirigir la organización hacia la transforma-ción del control actual hacia el control requerido), y Explotación (para su-pervisar si las adaptaciones y el control son suficientes para lograr los objeti-vos empresariales deseados). Finalmente se describe cómo us+APM el marco para el caso concreto de un servicio en una universidad española.Artículo A model-driven approach for the definition of reproducible and replicable data analysis projectsGonzález, Francisco Javier Melchor; Rodríguez-Echeverría, Roberto; Conejero, José María. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.It is becoming increasingly common to exploit the data collected by Information Systems in order to carry out an analysis of them and obtain conclusions that give rise to a series of decisions in the different research fields. The fact that in most cases these conclusions cannot be properly backed up has given rise to a reproducibility crisis in Data Science, the discipline that makes it possible to convert such data into knowledge, and in research fields that apply it. In this paper we envision a Model-Driven framework to foster reproducible and replicable Data Science projects. The framework proposes the definition of systematic pipelines that may be (semi)automatically executed in terms of concrete implementation platforms. Proprietary or third party tools are also considered so that flexibility may be ensured without hindering.Resumen Ontology-Driven Approach for KPI Meta-modelling, Selection and ReasoningRoldán-García, María del Mar; García-Nieto, José Manuel; Maté, Alejandro; Trujillo, Juan; Aldana-Montes, José F.. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.Modern applications of Big Data are transcending from being scalable solutions of data processing and analysis, to now provide advanced functionalities with the ability to exploit and understand the underpinning knowledge. This change is promoting the development of tools in the intersection of data processing, data analysis, knowledge extraction and management. In this paper, we propose TITAN, a software platform for managing all the life cycle of science workflows from deployment to execution in the context of Big Data applications. This platform is characterised by a design and operation mode driven by semantics at different levels: data sources, problem domain and workflow components. The proposed platform is developed upon an ontological framework of meta-data consistently managing processes and models and taking advantage of domain knowledge. TITAN comprises a well-grounded stack of Big Data technologies including Apache Kafka for inter-component communication, Apache Avro for data serialisation and Apache Spark for data analytics. A series of use cases are conducted for validation, which comprises workflow composition and semantic metadata management in academic and real-world fields of human activity recognition and land use monitoring from satellite images.Resumen Reconstruction of Gene Regulatory Networks with Multi-objective Particle Swarm OptimisersHurtado Requena, Sandro; García-Nieto, José Manuel; Navas Delgado, Ismael; Nebro Urbaneja, Antonio; Aldana-Montes, José F.. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.The computational reconstruction of Gene Regulatory Networks (GRNs) from gene expression data has been modelled as a complex optimisation problem, which enables the use of sophisticated search methods to address it. Among these techniques, particle swarm optimisation based algorithms stand out as prominent techniques with fast convergence and accurate network inferences. A multi-objective approach for the inference of GRNs consists of optimising a given network+IBk-s topology while tuning the kinetic order parameters in an S-System, thus preventing the use of unnecessary penalty weights and enables the adoption of Pareto optimality based algorithms. In this study, we empirically assess the behaviour of a set of multi-objective particle swarm optimisers based on different archiving and leader selection strategies in the scope of the inference of GRNs. The main goal is to provide system biologists with experimental evidence about which optimisation technique performs with higher success for the inference of consistent GRNs. The experiments conducted involve time-series datasets of gene expression taken from the DREAM3/4 standard benchmarks, as well as in vivo datasets from IRMA and Melanoma cancer samples. Our study shows that multiobjective particle swarm optimiser OMOPSO obtains the best overall performance. Inferred networks show biological consistency in accordance with in vivo studies in the literature.Resumen The bdpar Package: Big Data Pipelining Architecture for RFerreiro-Díaz, Miguel; R Cotos-Yáñez, Tomás; Méndez, José R.; Ruano Ordas, David. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.In the last years, big data has become a useful paradigm for taking advantage of multiple sources to find relevant knowledge in real domains (such as the design of personalized marketing campaigns or helping to palliate the effects of several fatal diseases). Big data programming tools and methods have evolved over time from a MapReduce to a pipeline-based archetype. Concretely the use of pipelining schemes has become the most reliable way of processing and analyzing large amounts of data. To this end, this work introduces bdpar, a new highly customizable pipeline-based framework (using the OOP paradigm provided by R6 package) able to execute multiple preprocessing tasks over heterogeneous data sources. Moreover, to increase the flexibility and performance, bdpar provides helpful features such as (i) the definition of a novel object-based pipe operator (+ACUAPgB8ACU), (ii) the ability to easily design and deploy new (and customized) input data parsers, tasks, and pipelines, (iii) only-once execution which avoids the execution of previously processed information (instances), guaranteeing that only new both input data and pipelines are executed, (iv) the capability to perform serial or parallel operations according to the user needs, (v) the inclusion of a debugging mechanism which allows users to check the status of each instance (and find possible errors) throughout the process.Artículo Search! Motor de búsqueda e integración de datos abiertosBerenguer Pastor, Alberto; Mazón, José-Norberto; Tomás, David. Actas de las XXVI Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2022), 2022-09-05.Los datos abiertos se han convertido en un recurso importante para dotar de valor añadido a productos y servicios informáticos. El acceso a los datos abiertos se suele realizar a través de portales donde diversas instituciones públicas suministran sus datos, muchas veces en un formato tabular como hojas de cálculo o archivos CSV. La mayoría de los portales de datos abiertos disponen de un motor de búsqueda basado en palabras clave sobre los metadatos. Esto conlleva dos problemas: (i) la búsqueda de datos abiertos se basa en la calidad de los metadatos (a menudo, no suficientemente adecuada), por lo que se puede llegar a resultados irrelevantes+ADs y (ii) el uso de palabras clave no tiene en cuenta la naturaleza tabular de (la mayoría de) los datos abiertos, por lo que se puede llegar a resultados no útiles para su integración con otros datos tabulares relacionados. Además, hoy en día, se debe realizar una búsqueda en cada portal de datos abiertos de las instituciones públicas, lo que resulta altamente tedioso. Para superar estos problemas, este artículo presenta la herramienta Search+ACE, un motor de búsqueda que permite a sus usuarias/os indicar una tabla de consulta como entrada para recuperar datos abiertos tabulares relevantes con los que integrarse.