THIS IS HARD TO DO! In our agile world we seem to never get the data model completed until two weeks after we are in production, and every project plan wants to waterfall the completion of this deliverable. I think it may be due to the rapid way we gather and refine requirements.
For example, on one of the last projects I worked on, we first modeled a prototype of the reporting and analytics layer of the data warehouse for use with the front-end reporting tool. We built and populated this layer based on the requirements for one specific (and very important) report. As the requirements were refined using the prototype, the data model progressed along with the requirements. We had set a timeline on this effort at two weeks. At the end of two weeks, we had to do the following:
- Refine the model with definitions for the presentation layer.
- Create the data warehouse layer with definitions and relationships.
- Create the staging layer with definitions.
- Validate the definitions and presentation layer design with the business users.
When the developers started using the data model to create the technical design, there came a few more changes. So the model changed again. During build we had to tweak the model just a bit more, but that meant MORE CHANGES. Finally, after build in the development environment, with a volume of test data loaded, we had to change the model again to attain the performance we wanted during query.
The data modeler is needed during the whole project, but maybe not full-time after the first month or two.
1 Comment
Joyce,
Surely not your intention, but the article may leave the reader with the impression that data modeling has to be performed parallel to the development of e.g. a reporting and analytics layer - or is a task that has to be done "on top".
Data modeling is definitely an integrated part of the development.
The right data modeling tool will support the denormalization process from the logical model to the physical model, keeping track of the columns' lineage and allow to generate the DDL for the target database. For the (inevitable) case of an iteration (certainly we humans often find issues created in the first step only while working on the second), the data modeling tool should offer to cascade modifications from the logical to the physical model. Also, if the modeling process was "bypassed", features to reverse engineer the database and synchronize the logical model will become handy.