It is easy to consider data migration as a movement problem. After all, we need to get our data from A to B with as little effort and cost as possible. With this viewpoint, many practitioners commence mapping and linking the source and target systems together to form an elaborate web of information chains.
This "movement focus" approach to data migration has prevailed for many years but often at the cost of poor quality of data that is unfit for the purpose of both migration and subsequent target functionality.
Many people will say they are "doing data quality" because they’ve got some data profiling, cleansing and testing phases clearly marked on their project plan. However, their overall migration strategy is still "movement oriented." Profiling and cleansing are secondary tasks that are bit-part players to the main mapping exercise.
I contest that data migration is not just a process of movement, but in fact a lesson in understanding.
You need to understand:
- How are we really using data in our legacy landscape?
- What business functions does our data serve and how will these functions change in the target?
- What quality of data do we have in our legacy data?
- How will our legacy data quality impact our target functions?
- What are the gaps between source and target data requirements/functionality requirements and how will this impact the migration?
This list barely scratches the surface of what we need to understand in a migration. It is no surprise that my favourite toolkit for a data migration includes a capable data quality management suite and a data discovery tool.
Do you see how this differs from the typical "data mapping" methodologies that so many projects follow?
If you see data migration as an exercise in plumbing, then you inadvertently shift the focus away from truly understanding how your data is being used and what you want it to achieve in the target system. Target table schemas are just containers and yet so many projects view them as the primary objective – fill each container with the maximum amount of information and we’ll have achieved our goal.
The most successful data migration projects I have been involved with have succeeded because we built our data quality management capability first. Everything else hooked into this central core activity.
For example:
- How can you cost and forecast a data migration schedule if you don’t understand the data quality levels?
- How can you develop a master data strategy for consolidating schemas if you don’t know where the correct sources of data exist?
- How can you map source to target columns together if you don’t know the meaning and usage of a particular attribute and what functions need to be supported by those values?
Review your data migration methodology.
Is it movement-centric? Is it focused on ETL activities? Is there a bias to data movement staff? Is data quality an afterthought with a token data profiling task thrown in for good measure?
When you place more focus on understanding and managing data quality, you will naturally find that all of those mapping and "ETL-esque" activities become far easier to implement. Design decisions become easier to make and your testing requirements are based on a much deeper understanding of how the data will be used.
What do you think? Do you approach your data migration methodology from a movement-centric or data quality-centric viewpoint? Does it matter? Welcome your views.