Data warehouses (DW) integrate several heterogeneous data sources in multidimensional structures (i.e. facts and dimensions) in support of the decisionmaking process in Business Intelligence. Therefore, the development of the DW is a complex process that must be carefully planned in order to meet user needs. In order to develop the DW, three different approaches, similar to the existing ones in Software Engineering (bottom-up or supply-driven, top-down or demanddriven, and hybrid), were proposed . The hybrid approach makes use of both data sources and user requirements, and avoids missing information from one of the two sources until the DW is already built.
However, by following the hybrid approach a new problem arises. DW elements are merged to consider the information from both requirements and data sources, each named using a different terminology. In turn, implicit traceability is lost, thus hurting requirements validation, making us unable to trace each requirement, and dramatically increasing the cost of introducing changes.
In order to solve this problem, in this paper, we perform a thorough review of literature on traceability, and, due to the special idiosyncrasy of DW development, we propose a novel trace metamodel specifically tailored to face several challenges: (i) connecting multiple sources with multiple targets in a meaningful way, as requirements need to be reconciled with data sources that may, or may not, match the expectations of the users. (ii) Being weakly coupled with DW models, as these models can change since there is no standard. Finally, (iii) minimizing the overhead introduced in the development process with the inclusion of traceability, by defining how traces should be generated in an automatic way, and maintaining them without user intervention wherever possible.
First, we introduce the semantics included in the metamodel, to cover the different relationships involved in DW development. Then, we describe how traces can be integrated within DW development by means of trace models. Afterwards, we show how these trace models can be aligned with the Model Driven Architecture (MDA) framework in order to semi-automatically generate traces within the DW development process. We show how to generate traces from user requirements to conceptual DW models by means of Query/View/Transformation (QVT) rules, thus saving time and costs required to record traces. Furthermore, we also describe how traces can be maintained without requiring human intervention when changes are introduced into the DW. Additionally, we show how the framework can be implemented within the Eclipse platform and how the results are integrated into a DW development approach.
In order to show the applicability of our proposal, we show an example of application based on a real case study with another university that involved designing several data marts for educational analysis. As shown in Figure 1, our framework allows us to trace each requirement, as well as any modifications, to its corresponding elements in the DW. The great benefit of our proposal is the improvement in requirements validation as well as being able to easily assess the impact of changes and regenerate the affected parts.
Our plans for the immediate future are developing a new set of QVT rules to explore the relationships between the conceptual and logical models, and explore the potential of using the information recorded in the traces in order to support automated analysis. We will also complete our development of the traceability framework in order to make the maintenance of traces as automatic as possible.
Acknowledgments This work has been partially supported by the MESOLAP (TIN2010-14860) and SERENIDAD (PEII-11-0327-7035) projects from the Spanish Ministry of Education and the Junta de Comunidades de Castilla La Mancha. Alejandro Maté is funded by the Generalitat Valenciana under the grant ACIF/2010/298.