What is reference data harmonization?

A few weeks back I noted that one of the objectives on an inventory process for reference data was data harmonization, which meant determining when two reference sets refer to the same conceptual domain and harmonizing the contents into a conformed standard domain. Conceptually it sounds relatively straightforward, but as with most data management techniques, its apparent simplicity hides a significant amount of complexity. Read More »

Post a Comment

Big data, Hadoop, and the Internet of Things walk into a conference

The panel moderator looks out over the audience. It’s a large crowd. For the first time ever, Big Data, Hadoop, and the Internet of Things are appearing on stage together. The conversation has just begun, so let’s listen in for a minute.

Big Data: “…and people have been trying to define me for years. No one seems to agree on who or what I am, but folks, here I am. Look at me: I am not 3V’s. I am so much more!”

Moderator: “Indeed you are, Big Data, and we are very happy to have you here with us this morning. Hadoop, want to introduce yourself?”

Hadoop: “Hi, my name is Hadoop and I recently turned eight years old.”

Moderator: “Happy birthday!” Read More »

Post a Comment

The celebrity of data: Big data goes big time in your organization

We were once oblivious to data. It was in the background. Just noise. The “byproduct” of applications that we used every day. A nuisance that screwed up every system migration or install.

Now, we wonder, who’s seeing our data? How might they use it? We constantly check and review our Facebook privacy settings. Can our data have an impact on our business and personal relationships? Have you ever Googled yourself to see what others can find on you? Read More »

Post a Comment

How to re-frame your data quality elevator pitch

If you work in a data quality team then chances are you’ll experience that awkward moment when someone in your organization asks the obvious question:

"So what does a data quality team do?"

Most people (outside of data quality) find this a relatively straightforward question to answer, but it always strikes me at events and industry meetups just how many people struggle to convey the importance and function of their data quality role. Read More »

Post a Comment

Better videos through data?

A few years ago, I hosted a webinar for students at Full Sail University. I discussed my third book, The New Small. After I covered my material, I opened up the floor for questions like this one:

Read More »

Post a Comment

A seasonal perspective on a single version of the truth

Yesterday was one of the two times a year that an equinox occurs. From its Latin roots, the term equinox translates as equal night since, on the day of an equinox, daytime and night are of approximately equal duration. This occurs because during an equinox the Sun is aligned with the center of the Earth.

An equinox also marks the changing of the seasons. What seasons, however, depends on your perspective. If you live in the Northern Hemisphere, yesterday marked the end of summer and the beginning of autumn, making it the autumnal equinox from your perspective. Whereas, if you live in the Southern Hemisphere, yesterday marked the end of winter and the beginning of spring, making it the vernal equinox from your perspective.

So depending on what side of the planet you live on, autumn either starts in September or March. Or if you live somewhere along the Equator, such as Indonesia, then autumn never starts—because the seasons never change. Read More »

Post a Comment

The celebrity of data: Taking data to the mainstream

[ce·leb·ri·ty], noun. the state of being well known

Media exposure, good or bad, is the surest way to gain celebrity.  Just ask any child actor gone bad in Hollywood. They know. Lately data has been getting more than its fifteen minutes of fame. And good or bad, I think it’s awesome. We’re at a tipping point when it comes to data. From the movies we see to the news we read, we can't escape data. It’s part of our everyday lives.

Here are some ways that data is shaping how we see the world around us.

The movies: Moneyball.  If you are a data geek like myself you had to really love Moneyball.  Billy Beane hires a stats geek who has a “new” way of using data (information, coincidentally, that the team already has) to pick players that cost less and will help win games. On-base percentages and slugging averages turn out to be better predictors of a team’s offensive success than batting averages, runs batted in (RBI) and stolen bases. This movie, about data of all things, won awards! Read More »

Post a Comment

Are you a data migration sponsor? A reminder of your responsibilities.

Data migrations are never the most attractive of projects to sponsor. For those who have sponsored them previously, migrations can be seen as a poison chalice. As for the first-timers, data migration initiatives are often perceived as a fairly insignificant part in a far grander production.

The challenge with data migration projects, of course, is that few organisations do them regularly, so there is often a dearth of technical ability internally and even less within the sponsor community. As a result, project sponsors often have no idea what their role entails because there is no one to seek advice from.

This can be compounded by external suppliers who often claim they’re a "one-stop shop" for the migration. The reality, of course, is that hidden in the fine print of your contract are some hazy requirements around "data extraction," "data preparation," "file delivery," "data quality requirements," "extraction specification" or any number of get-out clauses for suppliers and third parties. Read More »

Post a Comment

The Big Lebowski, dashboards, and Twitter

Like most of the bloggers for this site, I am active on Twitter. Over the past six years, I have tweeted more than 20,000 times.

Sounds like I have no life, eh?

Well, maybe, but do the math. I average about ten tweets per day. If you're trying to connect with others and occasionally promote a book or six, then that number starts to seem a little less extreme.

Read More »

Post a Comment

Errors, lies, and big data

My previous post pondered the term disestimation, coined by Charles Seife in his book Proofiness: How You’re Being Fooled by the Numbers to warn us about understating or ignoring the uncertainties surrounding a number, mistaking it for a fact instead of the error-prone estimate that it really is.

Sometimes this fact appears to be acknowledged when numbers are presented along with a margin of error.

This, however, according to Seife, is “arguably the most misunderstood and abused mathematical concept. There are two important things to remember about the margin of error. First, the margin of error reflects the imprecision caused by statistical error—it is an unavoidable consequence of the randomness of nature. Second, the margin of error is a function of the size of the sample—the bigger the sample, the smaller the margin of error. In fact, the margin of error can be considered pretty much as nothing more than an expression of how big the sample is.” Read More »

Post a Comment