How to keep your data migration project on the rails

Why do so many data migration projects fall off the rails? I’ve been asked this question a lot and whilst there are lots of reasons, perhaps the most common is a bias towards finding the wrong kind of data quality gaps.

Projects often tear off at breakneck speed, validating and cleansing all manner of data quality problems, without really understanding the big-picture issues that can call into question the entire migration strategy.

Why does this problem persist? Read More »

Post a Comment

Data diversity

I have consulted on enough enterprise system implementations to know that there's anything but uniformity on how to roll out a new set of mature applications. I've seen plenty of different methodologies and technologies for relatively similar back-office systems (read: ERP and CRM). Of course, some were better than others, although the results were remarkably consistent as I describe in Why New Systems Fail.

In comparison to technologies that handle what we now call big data, those internal applications differ on two fundamental levels. First, they are much less mature and, I would argue, by comparison relatively poorly understood. Second, while certainly essential to run myriad businesses, ERP and CRM applications house a relatively inconsequential amount of information. As Jordan Robertson writes in BusinessWeek:

The universe of data being generated and collected today is magnitudes larger than ever before. Companies are combing content that's online (blogs and social media) and offline (DMV and criminal records), as well as the growing amount of bits being spewed by the billions of Internet-connected devices (smartphones and thermostats). Computer-storage maker EMC estimates the amount of digital information in the world in 2020 will swell to 50 times what it is today.

And you thought that big data was, well, big today. We ain't seen nothin' yet. Read More »

Post a Comment

The low ethics of high-frequency trading

Imagine if your ability to feed your family depended upon how fast you could run. Imagine the aisles of your grocery store as lanes on a running track. If you can outrun your fellow shoppers, grab food off the shelves and race through the checkout at the finish line, then your family gets to eat. If not, then your family starves.

This "survival of the fleetest" would obviously be considered an unethical way to determine whose family gets to eat today. So why should it be considered an ethical way to determine whose family gets to invest for tomorrow?

In my last two posts on data ethics, I explored the questionable practices of free internet service providers and users. This post is going to look at ethical problems in a new area: the multi-billion dollar world of high-frequency trading, which is responsible for almost 99% of trades on the U.S. stock market. High frequency trading works like this: people with access to high-speed, fiber optic cables with advanced algorithms can use them to buy and sell shares in milliseconds. Those cables, however, cost millions of dollars per year, meaning only the wealthy elite can afford to use them. Read More »

Post a Comment

Cracking the code to successful conversions: establish testing and QA approach

Testing, testing, testing...do we really need testing? The answer is YES! Always! The big questions are:

1. How do I test?
2. What do I test?
3. Do we just do program testing or do we include testing data quality?
4. What about volume testing?
5. Who signs off on after-test completion? Read More »

Post a Comment

5 common data quality project mistakes (and how to resolve them)

Over the course of the last eight years, I've interviewed countless data quality leaders and learned so much about the common mistakes and failures they've witnessed in past projects.

In this post I wanted to highlight five of the common issues and give some practical ideas for resolving them:

#1: Not connecting data priorities to business priorities

One of the biggest data quality frustrations I’ve witnessed in the business community is a lack of focus on tangible business issues. Data quality improvement invariably takes the form of a data-centric viewpoint. This focus can often alienate business people who just can’t make heads or tails of what the ‘data team’ are relaying back to them. On the flip side, data quality practitioners often pull their hair out because they can’t see why the business doesn’t ‘get it’.

To resolve this, you obviously need to ensure that your data quality goals will address real business objectives that stakeholders have a personal investment in resolving. Read More »

Post a Comment

Better memory through data

We all lose things. Some of us are just better at finding them than others.

I had to remind myself of that fact the other night in Las Vegas. I went to dinner with a friend at Brio, an Italian restaurant in the Town Square shopping center on The Strip. As usual, I drove my car and dutifully parked it, but not right next to the restaurant. (This is an important fact.) For those of you who don't know, just about everything in Vegas is big. Big casinos, big parking lots, big stakes and sometimes big mistakes.

A few years ago, I forgot which hotel I had parked at. (Getting the floor wrong is inconvenient enough, never mind the wrong building!) Two hours later, I finally found my Acura on the third floor of Bally's Casino. I vowed then to take a always picture of a sign near where I park my car – e.g., A4 or C3. My iPhone makes that easy enough to do.

For some reason, though, I didn't follow my standard operating procedure this time – and I would soon pay the price. Read More »

Post a Comment

Mapping ethics in a data-driven world

In my previous post, I examined ethics in a data-driven world with an example of how Facebook experiments on its users. Acknowledging the conundrum facing users of free services like Facebook, Phil Simon commented that “users and customers aren’t the same thing. Maybe users are there to be, you know... used.”

What about when a free service allows its users to reach local customers and direct them to their business location? That's the case with Google Maps. Interestingly, some of these users go a step further and try to use the free service to pull customers away from their competitors.

As Kevin Poulsen blogged, “Google Maps is, at its heart, a massive crowdsourcing project, a shared conception of the world that skilled practitioners can bend and reshape in small ways using tools like Google’s Mapmaker or Google Places for Business. Beneath its slick interface and crystal clear GPS-enabled vision of the world, Google Maps roils with local rivalries, score-settling, and deception. Maps are dotted with thousands of spam business listings for nonexistent locksmiths and plumbers. Legitimate businesses sometimes see their listings hijacked by competitors or cloned into a duplicate with a different phone number or website. In January, someone bulk-modified the Google Maps presence of thousands of hotels around the country, changing the website URLs to a commercial third-party booking site (which siphons off the commissions).”

These are the ethical dilemmas of cartography in the era of big data. Today, the maps we draw of our data-driven world depend on both geographical information and data points. However, these data points now come from a wide variety of sources, not all of which are interested in pointing you in the right direction. Read More »

Post a Comment

Cracking the code to successful conversions: establish issue management

Let’s get this out of the way: EVERY PROJECT HAS ISSUES at one time or another. Sometimes a technical issue may need to be communicated to the project manager, escalated to gain resolution or just communicated within the project team. I seem to always hit issues during testing, and these issues never fail to keep me up at night. My issues usually revolve around:

  1. Source system data layouts or file definitions.
  2. Data quality that was overlooked in the source systems, and now must be resolved during conversion.
  3. Data volume – What? You mean 10 million records on an initial load is BAD?
  4. Data design for the target system is not optimal for query performance.
  5. Programs for the data extraction, transformation and load were not changed when the input data layout was changed, or just had a few translation issues.

I bet you figured it out by now – my life is surrounded by DATA. So, any planning and mitigation that we can do upfront to address these issues is totally worth it! Read More »

Post a Comment

Do you have a data quality alliance strategy?

Whether you’re embarking on a data quality mission for the first time or your presence is well known, it never hurts to have allies throughout your organization. By finding and gaining these supporters, you can gain influence and achieve your data quality goals. It may be difficult due to the many intersecting groups and initiatives, but the results are well worth the effort.

When I started working in data quality more than 20 years ago, the big problem I faced was getting traction in a small organisation. I knew we needed to make improvements, but it was hard to get the mandate from senior management.

Read More »

Post a Comment

Healthcare, big data and big frustrations

As part of The Affordable Care Act, many Americans have had to change insurance providers or plans. I’m old enough to realize that wide-sweeping changes like this legislation will surely face many legal, technological and financial obstacles; I've even talked about some of these issues before. Suffice to say, I didn’t expect a new policy that affects every American would be carried out without significant challenges. I just never anticipated basic communication would be one of them.

Let me explain. Read More »

Post a Comment