Data quality in the real world

If you work in data quality long enough you’ll meet detractors of data quality software. The viewpoint in this camp is that poor quality data should be driven out at the time of design, not retrospectively detected and fixed. They perceive data quality tools as a costly overhead, something that is superfluous in a well-designed information landscape.

In a perfect world, perfectly designed systems would have defect prevention built in. All of the various design authorities would share the same vision for absolute quality management and developers would follow a strict "data quality rule book" and never create code that allowed defects to emerge. Read More »

Post a Comment

The ethics of algorithmic regulation

In my last three posts on data ethics, I explored a few of the ethical dilemmas in our data-driven world. From examining the ethical practices of free internet service providers to the problem of high-frequency trading, I’ve come to realize the depth and complexity of these issues. Anyone who's aware of these issues would want to seek answers. Some have suggested government intervention, but is it even possible for governments to regulate such fast moving and rapidly evolving technology?

Evgeny Morozov, author of the book To Save Everything, Click Here: The Folly of Technological Solutionism, recently wrote an interesting Guardian article, The Rise of Data and the Death of Politics, that explored the ethical implications of the new data-driven approach to governance known as algorithmic regulation. Read More »

Post a Comment

What is reference data?

The question in the title of this blog post seems sort of unnecessary, right? Everybody knows what reference data is, so it may seem silly to try to define it. Yet the very presumption of ubiquitous understanding of the concept of reference data hides one of its biggest challenges: although we all think we understand reference data, few organizations have specific roles assigned for accountability over it.

The result is that there is no control over reference data, leading to numerous teams and projects creating, redefining and reinterpreting the semantics of enumerations of values under the rubric of “reference data.” The absence of coordination and enterprise control becomes an issue down the pike when data sets from different sources are brought together, and the differences in reference data values pose difficulties in data integration and consolidation.

To be able to assert control, then, we might start with a definition of reference data so those data sets can be segregated from ones that are directly “owned” within an operational domain or business function. At his reference data portal website, Malcolm Chisholm defines reference data:

"Reference data is any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise." Read More »

Post a Comment

How to keep your data migration project on the rails

Why do so many data migration projects fall off the rails? I’ve been asked this question a lot and whilst there are lots of reasons, perhaps the most common is a bias towards finding the wrong kind of data quality gaps.

Projects often tear off at breakneck speed, validating and cleansing all manner of data quality problems, without really understanding the big-picture issues that can call into question the entire migration strategy.

Why does this problem persist? Read More »

Post a Comment

Data diversity

I have consulted on enough enterprise system implementations to know that there's anything but uniformity on how to roll out a new set of mature applications. I've seen plenty of different methodologies and technologies for relatively similar back-office systems (read: ERP and CRM). Of course, some were better than others, although the results were remarkably consistent as I describe in Why New Systems Fail.

In comparison to technologies that handle what we now call big data, those internal applications differ on two fundamental levels. First, they are much less mature and, I would argue, by comparison relatively poorly understood. Second, while certainly essential to run myriad businesses, ERP and CRM applications house a relatively inconsequential amount of information. As Jordan Robertson writes in BusinessWeek:

The universe of data being generated and collected today is magnitudes larger than ever before. Companies are combing content that's online (blogs and social media) and offline (DMV and criminal records), as well as the growing amount of bits being spewed by the billions of Internet-connected devices (smartphones and thermostats). Computer-storage maker EMC estimates the amount of digital information in the world in 2020 will swell to 50 times what it is today.

And you thought that big data was, well, big today. We ain't seen nothin' yet. Read More »

Post a Comment

The low ethics of high-frequency trading

Imagine if your ability to feed your family depended upon how fast you could run. Imagine the aisles of your grocery store as lanes on a running track. If you can outrun your fellow shoppers, grab food off the shelves and race through the checkout at the finish line, then your family gets to eat. If not, then your family starves.

This "survival of the fleetest" would obviously be considered an unethical way to determine whose family gets to eat today. So why should it be considered an ethical way to determine whose family gets to invest for tomorrow?

In my last two posts on data ethics, I explored the questionable practices of free internet service providers and users. This post is going to look at ethical problems in a new area: the multi-billion dollar world of high-frequency trading, which is responsible for almost 99% of trades on the U.S. stock market. High frequency trading works like this: people with access to high-speed, fiber optic cables with advanced algorithms can use them to buy and sell shares in milliseconds. Those cables, however, cost millions of dollars per year, meaning only the wealthy elite can afford to use them. Read More »

Post a Comment

Cracking the code to successful conversions: establish testing and QA approach

Testing, testing, we really need testing? The answer is YES! Always! The big questions are:

1. How do I test?
2. What do I test?
3. Do we just do program testing or do we include testing data quality?
4. What about volume testing?
5. Who signs off on after-test completion? Read More »

Post a Comment

5 common data quality project mistakes (and how to resolve them)

Over the course of the last eight years, I've interviewed countless data quality leaders and learned so much about the common mistakes and failures they've witnessed in past projects.

In this post I wanted to highlight five of the common issues and give some practical ideas for resolving them:

#1: Not connecting data priorities to business priorities

One of the biggest data quality frustrations I’ve witnessed in the business community is a lack of focus on tangible business issues. Data quality improvement invariably takes the form of a data-centric viewpoint. This focus can often alienate business people who just can’t make heads or tails of what the ‘data team’ are relaying back to them. On the flip side, data quality practitioners often pull their hair out because they can’t see why the business doesn’t ‘get it’.

To resolve this, you obviously need to ensure that your data quality goals will address real business objectives that stakeholders have a personal investment in resolving. Read More »

Post a Comment

Better memory through data

We all lose things. Some of us are just better at finding them than others.

I had to remind myself of that fact the other night in Las Vegas. I went to dinner with a friend at Brio, an Italian restaurant in the Town Square shopping center on The Strip. As usual, I drove my car and dutifully parked it, but not right next to the restaurant. (This is an important fact.) For those of you who don't know, just about everything in Vegas is big. Big casinos, big parking lots, big stakes and sometimes big mistakes.

A few years ago, I forgot which hotel I had parked at. (Getting the floor wrong is inconvenient enough, never mind the wrong building!) Two hours later, I finally found my Acura on the third floor of Bally's Casino. I vowed then to take a always picture of a sign near where I park my car – e.g., A4 or C3. My iPhone makes that easy enough to do.

For some reason, though, I didn't follow my standard operating procedure this time – and I would soon pay the price. Read More »

Post a Comment

Mapping ethics in a data-driven world

In my previous post, I examined ethics in a data-driven world with an example of how Facebook experiments on its users. Acknowledging the conundrum facing users of free services like Facebook, Phil Simon commented that “users and customers aren’t the same thing. Maybe users are there to be, you know... used.”

What about when a free service allows its users to reach local customers and direct them to their business location? That's the case with Google Maps. Interestingly, some of these users go a step further and try to use the free service to pull customers away from their competitors.

As Kevin Poulsen blogged, “Google Maps is, at its heart, a massive crowdsourcing project, a shared conception of the world that skilled practitioners can bend and reshape in small ways using tools like Google’s Mapmaker or Google Places for Business. Beneath its slick interface and crystal clear GPS-enabled vision of the world, Google Maps roils with local rivalries, score-settling, and deception. Maps are dotted with thousands of spam business listings for nonexistent locksmiths and plumbers. Legitimate businesses sometimes see their listings hijacked by competitors or cloned into a duplicate with a different phone number or website. In January, someone bulk-modified the Google Maps presence of thousands of hotels around the country, changing the website URLs to a commercial third-party booking site (which siphons off the commissions).”

These are the ethical dilemmas of cartography in the era of big data. Today, the maps we draw of our data-driven world depend on both geographical information and data points. However, these data points now come from a wide variety of sources, not all of which are interested in pointing you in the right direction. Read More »

Post a Comment