Data prep should not be a one-time exercise

Intelligent organizations realize that data preparation should not be a one-time exercise. Here's the story of one organization that didn't get it.

Read More »

Post a Comment

Is data quality a component of data preparation? Or vice versa?

man considers data quality versus data preparationCritical business applications depend on the enterprise creating and maintaining high-quality data. So, whenever new data is received – especially from a new source – it’s great when that source can provide data without defects or other data quality issues.

The recent rise in self-service data preparation options has definitely improved the quality of data from some sources. That's especially the case with self-service portals that allow customers to maintain, for example, their current postal addresses, email addresses, phone numbers and preferred contact method. Of course, such options are not available for every source. And even when they are, they don't obsolesce the need to build and maintain automated data quality processes, such as those incorporated into enterprise data warehouses and master data management hubs. Read More »

Post a Comment

Streaming data and wearables: Allies against rising health care costs

It’s nearly impossible to avoid the debate. From politicians and pundit commentary, to dinner table discussions across the United States, the hot topic for the last several years has been the rising cost of health care.

Consider that health care expenditures in the US were $3 trillion in 2014 and are expected to rise by 5.2% annually. Contentious deliberations continue about the reasons for our soaring health care costs and the best ways to lower them. But there’s one thing most people agree on: Preventive care is a more desirable, lower-cost option than patient treatment.

Read More »

Post a Comment

Data preparation strengthens Hadoop information chain

business man thinking about data prep for analytics and HadoopHadoop has driven an enormous amount of data analytics activity lately. And this poses a problem for many practitioners coming from the traditional relational database management system (RDBMS) world.

Hadoop is well known for having lots of variety in the structure of data it stores and processes. But it's fair to say there is additional variation in the control and quality of the information chains that feed into and out of a typical Hadoop installation. Read More »

Post a Comment

What can a garage storage system, kayak and surfboard teach you about data modernization?

Two years ago, I found myself the proud, first-time owner of a garage. My wife and I quickly started to add new items to the garage – a battery-powered lawn mower, two beach cruisers and four Tommy garage storage system represents modernization for data architectureBahama beach chairs. They were stored with ease. What a fantastic world I'd been missing out on. But it wasn't long before we outstripped our existing garage storage system's (GSS) capacity by adding new items like a surfboard, soccer goals and kid bikes. I could squeeze the bikes in there, but the surfboard was much larger than anything else I had stored in this system. Storing and retrieving objects soon became complicated, frustrating and time-intensive.

Next, I found myself fascinated by Rubbermaid FastTrack, a GSS that consists of numerous three- to six-foot bands of steel that you bolt onto the studs in your garage walls. You then hang overpriced hooks, ball bins and bike holders from these steel bands.

As I was hanging these tracks in the 95-degree heat last summer, I had fever dreams of how this garage storage problem was similar to the challenges of data modernization. And how, similar to homeowners, organizations struggle to modernize their data architecture to accommodate new and varying types and sizes of data at unprecedented rates. Read More »

Post a Comment

Data prep considerations for analytics, Part 2

In my last post, I covered some of the first data preparation questions to ask when going down the analytics road. I was just getting started, though. There are plenty more things to consider in this vein.

Read More »

Post a Comment

Clean-up woman: Part 2

In my last post, I talked about how data still needs to be cleaned up – and data strategy still needs to be re-evaluated – as we start to work with nontraditional databases and other new technologies.

cleaning supplies for clean-up womanThere are lots of ways to use these new platforms (like Hadoop). For example, many organization are using a data lake as a staging area for minimally processed data. That said, this may be the place used to analyze and profile data. Not only for quality of data, but for integrity and completeness of the data between processes. Kind of like an audit area. For example, maybe we made a change to configuration of a source system, and we want to make sure the data coming out of that system is processed according to the testing and/or use scenarios. Read More »

Post a Comment

Data prep considerations for analytics, Part 1

42-28147658I'm hard-pressed to think of a trendier yet more amorphous term today than analytics. It seems that every organization wants to take advantage of analytics, but few really are doing that – at least to the extent possible. This topic interests me quite a bit, and I hope to explore it more in the fall. (I'll be teaching Enterprise Analytics in the fall as I start my career as a college professor.)

But analytics (and all of its forms) are predicated on, you know, data. And, as we know from this blog, data is often not ready to be released into the wild. Many times, machines and people need to massage data first – often to significant degrees. Against that backdrop, in this post and its successor, I'll list some key data preparation questions to think about as they pertain to analytics. Read More »

Post a Comment

Data cataloging for data asset crowdsourcing

people studying data catalogsWhat does it really mean when we talk about the concept of a data asset? For the purposes of this discussion, let's say that a data asset is a manifestation of information that can be monetized. In my last post we explored how bringing many data artifacts together in a single repository enabled linkage, combination and analysis that could lead to profitable business actions.

On the one hand, the more data that's available, the better chance there is for combining multiple artifacts in ways that can be monetized. But at the same time, the more data there is to search from, the more difficult it is to figure out what you need, whether it exists and how it can be used.

Read More »

Post a Comment

Clean-up woman: Part 1

cleanup woman thinking of data preparation for analyticsIf your enterprise is working with Hadoop, MongoDB or other nontraditional databases, then you need to evaluate your data strategy. A data strategy must adapt to current data trends based on business requirements. So am I still the clean-up woman? The answer is YES!

I still work on the quality of the data. Only now, instead of just for the data warehouse, we interpret and analyze output from our data profiling tools, and we reflect changes to the enterprise source systems. (Or we should, anyway.)

For the most part, all the data strategy work we have completed in the past is still pertinent. What we must understand is that today's data is absorbed other ways, and much faster than with our traditional data warehousing platforms. That said, data preparation and integration must happen very quickly. Latency in the data may not be tolerated in these new analytic platforms.

Self-service data access typically happens using nontraditional databases (i.e., Hadoop). Analysis cannot wait for IT to land the data and prepare the data for analysis. It's needed sooner! The teams using this technology are looking for all enterprise data (including data from the data warehouse) as fast as they can get it. In some cases, they're abstracting data that's sent between processes, and interpreting results prior to the end of the process. This, in itself, could change how fast we can do business.

We've always wanted to address data quality issues as close to the source as possible. With today’s technology, we'll be able to absorb and interpret quality and integrity of the data faster than ever before.

So, cleaning up and profiling data after the process is still required. But our new data strategy must reflect changes back to the source systems, ensuring better quality data for the future. New platforms help to make this possible.


Got 2 minutes? Watch Data Preparation for Analytics in the Age of Big Data.

Post a Comment