What can a garage storage system, kayak and surfboard teach you about data modernization?

Two years ago, I found myself the proud, first-time owner of a garage. My wife and I quickly started to add new items to the garage – a battery-powered lawn mower, two beach cruisers and four Tommy garage storage system represents modernization for data architectureBahama beach chairs. They were stored with ease. What a fantastic world I'd been missing out on. But it wasn't long before we outstripped our existing garage storage system's (GSS) capacity by adding new items like a surfboard, soccer goals and kid bikes. I could squeeze the bikes in there, but the surfboard was much larger than anything else I had stored in this system. Storing and retrieving objects soon became complicated, frustrating and time-intensive.

Next, I found myself fascinated by Rubbermaid FastTrack, a GSS that consists of numerous three- to six-foot bands of steel that you bolt onto the studs in your garage walls. You then hang overpriced hooks, ball bins and bike holders from these steel bands.

As I was hanging these tracks in the 95-degree heat last summer, I had fever dreams of how this garage storage problem was similar to the challenges of data modernization. And how, similar to homeowners, organizations struggle to modernize their data architecture to accommodate new and varying types and sizes of data at unprecedented rates. Read More »

Post a Comment

Data prep considerations for analytics, Part 2

In my last post, I covered some of the first data preparation questions to ask when going down the analytics road. I was just getting started, though. There are plenty more things to consider in this vein.

Read More »

Post a Comment

Clean-up woman: Part 2

In my last post, I talked about how data still needs to be cleaned up – and data strategy still needs to be re-evaluated – as we start to work with nontraditional databases and other new technologies.

cleaning supplies for clean-up womanThere are lots of ways to use these new platforms (like Hadoop). For example, many organization are using a data lake as a staging area for minimally processed data. That said, this may be the place used to analyze and profile data. Not only for quality of data, but for integrity and completeness of the data between processes. Kind of like an audit area. For example, maybe we made a change to configuration of a source system, and we want to make sure the data coming out of that system is processed according to the testing and/or use scenarios. Read More »

Post a Comment

Data prep considerations for analytics, Part 1

42-28147658I'm hard-pressed to think of a trendier yet more amorphous term today than analytics. It seems that every organization wants to take advantage of analytics, but few really are doing that – at least to the extent possible. This topic interests me quite a bit, and I hope to explore it more in the fall. (I'll be teaching Enterprise Analytics in the fall as I start my career as a college professor.)

But analytics (and all of its forms) are predicated on, you know, data. And, as we know from this blog, data is often not ready to be released into the wild. Many times, machines and people need to massage data first – often to significant degrees. Against that backdrop, in this post and its successor, I'll list some key data preparation questions to think about as they pertain to analytics. Read More »

Post a Comment

Data cataloging for data asset crowdsourcing

people studying data catalogsWhat does it really mean when we talk about the concept of a data asset? For the purposes of this discussion, let's say that a data asset is a manifestation of information that can be monetized. In my last post we explored how bringing many data artifacts together in a single repository enabled linkage, combination and analysis that could lead to profitable business actions.

On the one hand, the more data that's available, the better chance there is for combining multiple artifacts in ways that can be monetized. But at the same time, the more data there is to search from, the more difficult it is to figure out what you need, whether it exists and how it can be used.

Read More »

Post a Comment

Clean-up woman: Part 1

cleanup woman thinking of data preparation for analyticsIf your enterprise is working with Hadoop, MongoDB or other nontraditional databases, then you need to evaluate your data strategy. A data strategy must adapt to current data trends based on business requirements. So am I still the clean-up woman? The answer is YES!

I still work on the quality of the data. Only now, instead of just for the data warehouse, we interpret and analyze output from our data profiling tools, and we reflect changes to the enterprise source systems. (Or we should, anyway.)

For the most part, all the data strategy work we have completed in the past is still pertinent. What we must understand is that today's data is absorbed other ways, and much faster than with our traditional data warehousing platforms. That said, data preparation and integration must happen very quickly. Latency in the data may not be tolerated in these new analytic platforms.

Self-service data access typically happens using nontraditional databases (i.e., Hadoop). Analysis cannot wait for IT to land the data and prepare the data for analysis. It's needed sooner! The teams using this technology are looking for all enterprise data (including data from the data warehouse) as fast as they can get it. In some cases, they're abstracting data that's sent between processes, and interpreting results prior to the end of the process. This, in itself, could change how fast we can do business.

We've always wanted to address data quality issues as close to the source as possible. With today’s technology, we'll be able to absorb and interpret quality and integrity of the data faster than ever before.

So, cleaning up and profiling data after the process is still required. But our new data strategy must reflect changes back to the source systems, ensuring better quality data for the future. New platforms help to make this possible.

Got 2 minutes? Watch Data Preparation for Analytics in the Age of Big Data.

Post a Comment

Data prep and self-service analytics – Turning point for governance and quality efforts?

538938021The demand for data preparation solutions is at an all-time high, and it's primarily driven by the demand for self-service analytics. Ten years ago, if you were a business leader that wanted to get more in-depth information on a particular KPI, you would typically issue a reporting request to IT and wait some indeterminate time for a dashboard or report to be constructed – which hopefully matched your needs.

Things are very different today.

Read More »

Post a Comment

Modernization requires unification

"Temporary solutions often become permanent problems."

—Craig Bruce

For a long time now, many large, mature organizations have struggled with effective data integration.

Of course, this should surprise exactly no one. As I have seen firsthand in my career, organizations often cannot rid themselves of legacy applications, technologies, systems and mindsets. (For more on this, see Why New Systems Fail.) And one can easily apply that general statement to data integration tools, many of which organizations deployed in an era of small, structured and predictable data sets.

Read More »

Post a Comment

Who was that masked data?

42-20358875Data access and data privacy are often fundamentally at odds with each other. Organizations want unfettered access to the data describing customers. Meanwhile, customers want their data – especially their personally identifiable information – to remain as private as possible.

Organizations need to protect data privacy by only granting data access to authorized business users. But even when data access has been authorized, there are still sensitive aspects of data that should be masked when presented to business users.

Data masking, also referred to as anonymization, obscures data values by replacing them with equivalent, but non-sensitive, values that can still be used for operations such as joining relational tables and analytics such as representing individuals in time series or transactional data. Masking uses non-reversible (without knowing the key) encryption to make data more difficult to decipher when unauthorized access occurs (i.e., a hack or other security breach). Masking anonymizes data by removing, obscuring, aggregating or altering data so it does not identify individuals. This allows for a much wider, and much safer, use of the information.

Read More »

Post a Comment

Crowdsourcing data assets in the data lake

man in server room contemplating data lakesA long time ago, I worked for a company that had positioned itself as basically a third-party “data trust” to perform collaborative analytics. The business proposition was to engage different types of organizations whose customer bases overlapped, ingest their data sets, and perform a number of analyses using the accumulated data sets. This work could not be done by any of the clients on their own.

By providing a trusted repository for the data, the company ensured that no client would have access to any other client’s data; yet all would benefit from the analytics applied across the board when value could be derived from merging the data sets. Read More »

Post a Comment