Post a Comment
If you’re reading this, there’s a strong chance your organisation is on the road to data quality management maturity. One of the challenges you’ll obviously face is how to deal with all the defects discovered.
Many data quality problems can be "cleansed" instantly using appropriate technology, but for a lot of issues we just don’t have enough context to indicate what the correct value should be.
For example, if your data relates to the power industry, then you can infer a lot of data to indicate what the power rating, capacity and manufacturer of a power plant is – but there is a great deal of data that is specific to the installation and operation of that particular kit.
The same applies to customer data. We may have conflicting address or contact details for a customer, but it’s not always easy to specify what the accurate version should be.
In data quality there are often shades of grey. How do we manage data quality improvements in this situation?
Conventionally, the approach is often to perform major cleanses or campaigns to get the data up to an acceptable level. This obviously comes at a cost and can be a big hit on the business if they’re doing reality checks on data all day long.
There is another more organic method that your organisation could employ. It’s not right for every situation, but it could work well for data that has a low volatility and requires human judgement to correct. Read More
Post a Comment
"Big data isn't useful for investment purposes."
So said my friend Mike during one of our recent
arguments discussions. By way of background, Mike is not an über-successful 70-year-old investor who earned his chops well before the advents of Twitter, Facebook and their ilk. Rather, he's a man of a similar age to yours truly. We're college friends. To boot, Mike works for IBM, a company that has bet much of its future on big data.
Of course, I disagreed with him. (Mike and I have a history of animated debates across a number of issues. Case in point: I love LeBron; Mike hates him.) On the big data front, my primary argument was anecdotal. A little more than two years ago, I bought Apple at its near record-high. I was tired of missing out on the gains that respected analysts were promising. At that time, everything that Apple did helped its stock price.
Preventing bad investment decisions
Since the time of my purchase, Apple's stock has plummeted. (I'm convinced that my relatively small position singlehandedly drove down its price.) Kidding aside, today Apple sentiment has dramatically shifted. Nine million iPhone 5s's sold? Snooze. Sentiment is nothing short of horrible. Apple's P/E ratio is hysterically low, especially when you consider the mind-boggling $140 billion it holds in cash. Read More
Post a Comment
Each year, I'm excited to see the awards nominations for Data Steward of the Year come in. It's not just because we enjoy seeing the program grow each year (which is true, based on the number of nominations we receive). It's also because of the variety of the nominations – and the impressive work that people are doing each year to manage data.
This year's field was incredibly deep. We had nominations from three continents and a variety of industries. The data stewards in the competition worked for organizations that were big and small. After our distinguished panel reviewed the nominations, we are excited to name Maureen Spence of Capital One as the 2013 Data Steward of the Year. Maureen is a Senior MIS Analyst Manager at Capital One. She leads a team that supports all Bank Data Store applications, as well as more than 1,000 direct and 10,000 indirect users.
Post a Comment
In my last post, we started to look at some of the issues with the concept of “big data governance,” especially when a large part of governance is intended to prevent the introduction of errors into data sets. Many big data analytics applications focus on the intake of numerous varied data sources acquired from external sources. By the time the data has been brought into the organization, it is basically too late to have any impact on the data creation process, so preventing errors is out the window.
In fact, the problem is much worse than that for two reasons. First, in many cases the data sets being used are not only created by parties outside the administrative domain, the internal users may have no idea where the data came from altogether. For example, public US federal transparency data sets published at www.data.gov are created solely for the purpose of posting the data to the web site, but the values populating those data sets may have come from numerous internal applications designed and implemented to support specific business functions, and the resulting data sets are effectively created without any of the original context.
That means that the actual details of the originating system are completely lost, often including the technical/structural metadata (such as data types and lengths) as well as the more important business metadata such as data element definitions and reference data domains. The user of the data is compelled to manufacture the semantics based on intuition and context, but not much else. Read More
Post a Comment
I was speaking to one of our UK members last week who is responsible for data quality in a large public sector organisation. When we got talking about the most useful techniques she has drawn on to mature data quality in her organisation, I got one of the most common responses I hear from senior data quality practitioners: they have had to master the art of internal sales.
Many people mistakenly perceive successful selling as an innate character trait that some people are lucky to be born with. I think most of us, given the choice, would prefer to not be placed in a position where persuasive selling was required. The truth is, whether you’re selling home insurance or data quality, salesmanship is a skill that can be developed by anyone. When mastered, it can become one of the most essential techniques in your data quality leadership toolkit.
One of the principal sales techniques of the data quality leader is to focus on the fear aspect of data quality management, or rather, the lack of data quality management. Human beings are motivated more by fear than gain, which is why in recent decades we’ve become obsessed with buying anti-bacterial soaps, for example.
Typical fear-based selling of data quality may look something like these sales pitches:
- “If we don’t implement a data quality project, we’ll be in breach of regulatory controls.”
- “If you don’t buy a data quality tool on your next integration project, it will come in late and over-budget.”
- “If you don’t implement enterprisewide data quality management, then we can’t mature our data strategy.”
The problem with labouring on fear is that we can often force our senior sponsors to become paralysed when too much fear is applied. We introduce so much risk and doubt in their minds that they simply can’t move forward with any kind of decision, least of all the one we really want – which is some positive data quality action. Read More
Post a Comment
How do employees feel about data today?
It's an interesting question, and one that I address in my forthcoming book The Visual Organization: How Intelligent Companies Use Data Visualization To Make Better Decisions. As a student of data, management and technology, it's obvious to me that our relationship with data has changed over the last decade – and I'm not the only one who feels that way.
Jim Davis notes a few of these changes in his post about the view from the corner office:
- More of our employees want access to data than ever before.
- They want answers faster than before, too.
- They want the data in a format they can quickly understand and share with others.
In other words, employees want data that is more transparent, visual and shareable. Organizations like the University of Texas recognize these needs and have taken steps to empower their employees, students and the public at large. Read More
Post a Comment
In a previous post, I urged you to prevent the spread of the data zombie virus. However, not all viruses are bad. In fact, there are even viral outbreaks that can be good for your organization.
One of my favorite books is Unleashing the Ideavirus, where author Seth Godin explains how the most effective business ideas are the ones that spread. Godin uses the term ideavirus to describe an idea that spreads, and the term sneezers to describe the people who spread it.
When making the business case for it, data quality can be a hard sell. Which is why selling the success of a recently completed pilot project will make your business case more compelling. Therefore, carefully choose a pilot project that is almost guaranteed to be successful. For this, you need the following three things:
- Well-defined ROI – Returns on investment are tangible business impacts, such as mitigated risks, reduced costs or increased revenues. Alternatively, Daragh O Brien suggests using ROM (return on Maalox).
- Narrow scope – A pilot project that delivers ROI without too much time, effort and cost. While this isn’t an easy thing to find, don’t get thwarted by scope creep driven by the fear your ROI isn’t big enough.
- Sneeziness – A pilot project that is either inherently sneezable or one that has a great sneezer acting as your executive sponsor or project champion. Read More
Post a Comment
In my last post, I noted two key issues where there is the desire to impose governance over large-scale data sets imported from outside the organization: the absence of control and the absence of semantics. Of course, we cannot just throw up our hands and say that the data is ungovernable. Rather, we have to examine what the intent of governance is in light of these constraints.
One approach is to reframe the question, leading to some alternative approaches to governance. Instead of considering governance as a way of controlling the creation and processes that touch data within the production cycle, consider governance as a means for controlling expectations regarding consumption and usability of the data.
This is a more practical approach, especially considering that in most cases, reports and analyses driven by a big data approach are not likely to be slowed or halted as a result of questions raised about the processes used to create the source data. In addition, many big data environments may also be designed to stream data from real-time semi-structured or unstructured sources that either have no predefined metadata, or are subject to rapid changes in structure that may limit the ability to presuppose rules about formats and structure.
The orientation I am suggesting in this post covers two facets of data utilization. Consumption looks at the business scenarios in which the big data environment is used and what the expectations are from a high-level functional perspective. Usability refers to the degree to which the expected outcomes are skewed as a result of data issues and what the users’ level of tolerance is to that skew. Read More
Post a Comment
Whilst the success rate of data migration initiatives has climbed in recent years, I still find that one of the key goals of data migration, legacy decommission, often gets overlooked.
The financial benefits of shutting down the legacy environment are many. Relinquishing licenses and dormant hardware are obvious advantages of legacy decommission, but there are many other benefits, too. For example, reducing the complexity of your data landscape and eliminating administration are clearly beneficial.
So why do legacy environments live on beyond their planned decommission deadline?
Part of the reason is because, surprisingly, there is no decommission deadline! There is often no planning at all to complete the decommissioning process. All the focus is on the target system load and go-live. As a result, the legacy environment persists and everyone is too nervous to make the closure complete.
How can you improve on this situation? Read More