Debido al alto tráfico generado por robots, estamos aplicando limitaciones en el número de peticiones permitidas por cliente y bloqueos por IP automáticos. Si haces un uso legítimo y estás teniendo problemas, avísanos para reevaluar nuestras políticas de bloqueo. Disculpa las molestias.

Resumen:
On the Readiness of Scientific Data Papers for a Fair and Transparent Use in Machine Learning

Cargando...
Miniatura

Editor

Sistedes

Publicado en

Actas de las XXIX Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2025)

Licencia Creative Commons

Resumen

To ensure the fairness and trustworthiness of machine learning (ML) systems, recent legislative initiatives and relevant research in the ML community have pointed out the need to document the data used to train ML models. Besides, data-sharing practices in many scientific domains have evolved in recent years for reproducibility purposes. In this sense, academic institutions' adoption of these practices has encouraged researchers to publish their data and technical documentation in peer-reviewed publications such as data papers. In this study, we analyze how this broader scientific data documentation meets the needs of the ML community and regulatory bodies for its use in ML technologies. We examine a sample of 4041 data papers of different domains, assessing their coverage and trends in the requested dimensions and comparing them to those from an ML-focused venue (NeurIPS D&B), which publishes papers describing datasets. As a result, we propose a set of recommendation guidelines for data creators and scientific data publishers to increase their data's preparedness for its transparent and fairer use in ML technologies.

Descripción

Acerca de Giner-Miguelez, Joan

Palabras clave

Data Documentation, Data Papers, Machine Learning, Datasets, Fair AI

Citación

Giner-Miguelez, J., Gómez, A., Cabot, J.: On the Readiness of Scientific Data Papers for a Fair and Transparent Use in Machine Learning. In: Burgueño, L. (ed.) Actas de las XXIX Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2025). Sistedes (2025). https://hdl.handle.net/11705/JISBD/2025/75