Want to improve data quality? Start by re-imagining your data boundaries.

Bird flying freeSome years ago I was consulting in a large financial institution. I was brought in to help transfer the company from one financial classification scheme to a new scheme. The project manager assumed this was a three-month project.

He was mistaken.

Nearly six months into the project we realized this was no small feat. The legacy classification had been shared hundreds of times, moving across countless organizational boundaries.

The big problem lay with the way information was supplied by the provider of the classification. The data was emailed or shipped by CD – and so, of course, it became virtually impossible to track the proliferation of its contents across the organization. Some departments were up-to-date with the latest standard; others were far behind.

At another organization, a utilities provider, the company received pricing information from a third-party national supplier each month. Once again, the data would enter the organization, be heavily cleansed and transformed, then distributed across the organization. Read More »

Post a Comment

Top 5 benefits of managing data where it is

Data. We now have lots of it. Everywhere. Historically, before we tried to do too much to manage any of it, we first moved data to a central location (e.g., the staging area for an enterprise data warehouse). This blog post touches on the top five benefits of managing data where it is (e.g., in-cloud, in-database, in-memory, in-stream).

dataMinimize data movement

In addition to staging it for data management processes, another common reason for moving data is to make a local copy of it. The proliferation of those copies is the data silo challenge most organizations are mired in. However, most of what are referred to as data silos are actually application silos because data and applications became tightly coupled as applications were built around where the data was moved to and manipulated (i.e., cleansed, transformed, unduplicated and structured). Storing data in one easily accessible location (e.g., in the cloud or in Hadoop) and, instead of moving it, building services
around your data, is a best practice more and more organizations are moving toward. Read More »

Post a Comment

Data beyond boundaries: Reexamining the role of the data-governance council

sb10066015tThe theme on the Data Roundtable this month is managing data beyond boundaries.

You'll get no argument from me on the import of this subject, especially today. After all, we long ago entered the era of big data. As such, it's imperative to look anew at traditional data-related concepts, roles and rules. Which are still relevant? Which need to adapt or die?

I can think of no entity that should take a harder look in the mirror than the conventional data-governance council. Read More »

Post a Comment

Data integration in the cloud

I feel like I'm singing a song called Data in the Sky – With Options! The cloud is forever in our minds these days as a lower cost option because it requires fewer resources to address our data needs. Cloud solutions are an increasing part of many organizations' budgets every year. sky, cloud, buildings

Whether enterprise data is on the cloud or in some computer center on RAC, issues around data integration still persist in most every organization. For example, data movement and integration are required in a cloud implementation just as they are in other implementations. And if our data volume is enormous, cloud may not be the answer. We need to look at the cloud as just one solution to help meet our data needs.

Read More »

Post a Comment

The Integration of Things

The Internet of Things (IoT) has become the new It Girl of the IT world. Of course her big brother big data continues to generate big buzz. My sis from another miss Tamara Dull has blogged about the relationship between big data and IoT, positing big data is a subset of IoT on the basis that “big data is about data, plain and simple,” whereas “IoT is about data, devices and connectivity.”292427180

Connecting devices to the Internet is not the only thing that sets IoT apart from big data. As of 2008, IoT was already a big part of creating big data since by then there were more devices connected to the Internet than people. And by 2020 Cisco predicts IoT will have 50 billion connected devices. IoT has already made progress in many industries, including healthcare, manufacturing, energy and retail. IoT has also already hit the mass market in the form of consumer electronics and household appliances enabled by embedded software and sensors to collect and exchange data. Read More »

Post a Comment

Adapting to the Silo 3.0 challenge

adapting strategyWorking on a data migration project gives you a unique opportunity to learn where your organization has fallen short in its data management strategy. It's when you start to explore your legacy data landscape that you get a feel for how big a silo challenge your company has.

It wasn't always this way, of course.

In the 1960's, organizations mostly relied on one centralized system to manage the computing function. It may have required a room the size of your house, but having all your eggs in one basket was the order of the day. Let's call these earliest beginnings Silo 1.0. After all, in most organizations, there was only the one silo.

Read More »

Post a Comment

Using big data techniques to increase database performance

Many people perceive big data management technologies as a “cure-all” for their analytics needs. But I would be surprised if any organization that has invested in developing a conventional data warehouse – even on a small scale – would completely rip that data warehouse out and immediately replace it with an NoSQL database or a Hadoop cluster. More likely, earlier adopters would create a hybrid environment that incorporates a variety of technologies for specialized purposes.139545186

Alternatively, you could adapt different technologies to support and improve existing platform componentry. An interesting scenario is using Hadoop to augment storage for an existing data warehouse. Despite the size of many enterprise-class data warehouses, most of the queries access only a small percentage of the data. Here's a practical example. A team member recently did a study of a number of queries against a data warehouse comprised of approximately 150 tables. He reported that over a specific time period, more than 67% of the queries only touched three tables, while more than 95% touched a total of eleven tables and 99% of the queries touched twenty-four tables. This showed that most of the accesses touched a very small percentage of the tables in the entire data warehouse. Read More »

Post a Comment

Managing big data, Part 2: Human questions and considerations

In my previous post, I listed some of the technical questions and considerations that big data introduces. Truth be told, though, making sense out of big data requires much more than deploying new technologies, adding new services, tweaking database partitioning and ETL jobs, and upping compute power.482187289

No, finding the signal in the noise that is big data is as much of a human issue. As I learned early on in my career, technical chops by themselves only get you so far. The smartest person in the room may know the right answer, but can s/he effectively communicate it?

In this post, I'll address some of the human considerations broached by big data.

Read More »

Post a Comment

Know a data steward who's scary good? Nominate them for a Stewie.

There's a sense of foreboding and uncertainty. You look around, but you're uncertain where to go next. What to do. Or who to turn to. Ultimately, it feels like an eerie calm before the storm. And you get that creepy feeling that something awful is just around the corner.

social-tile-mad-science2I might be describing a haunted house at Halloween. Or, I could be talking about your data management strategy now that big data has turned your strategies and practices upside down. Either way, you are probably uncertain, unsure and (maybe) a little scared of what's on the horizon.

That's why every year we celebrate International Data Stewards Day. On this holiday, we celebrate the brave ones among us who look the data monsters in the face and bring them to heel. They don't need a wooden stake or a silver bullet. No, they just need the steely-eyed calm gained from years of experience in the data trenches. That gives them an unmatched understanding of how their data works – and how things can get better.

How can you get involved? Here are four easy things you can do: Read More »

Post a Comment

Fields and dreams of data warehousing

Field of dreams warehouse”– a historic phrase I used in the early days of data warehouse development. It describes the frenzy of activity that took place to create enterprise data infrastructure, before the business rationale for the data use was even understood. Those were the early days.

Dream data warehouse abstract imageIn some ways the Hadoop rush has parallels, beginning in a similar fashion. Many organizations have switched on Hadoop clusters to collect any and all data that may possibly have value, because they know they can sift through the noise with technology to discover the golden nuggets. A significant difference, however, is that Hadoop provided an immediate cost reduction strategy and so has had the benefit of a more tangible, measured value.

But a predefined, clear data strategy was generally not at the forefront of either of these initiatives. Build it and they will come … well, not necessarily. Store it and we’ll find use for it later … maybe so, but perhaps not. Read More »

Post a Comment