Characteristics of IoT data quality

measuring devices on factory floorIn my last post we started to look at two different Internet of Things (IoT) paradigms. The first only involved streaming automatically generated data from machines (such as sensor data). The second combined human-generated and machine-generated data, such as social media updates that are automatically augmented with geo-tag data by a mobile device.

In both of these cases, much of the data is automatically created – so what does it mean to talk about data quality? The answer requires two tasks: a reconsideration of the dimensions of data quality, and a focus on end-user data usability.

Read More »

Post a Comment

Data gone awry, Part 2: What to do when your business data deceives you.

In my last post, I hopefully disavailed you of the notion that big data and analytics guarantee successful business decisions and outcomes. Today, I'll discuss what to do once you realize that you bet on the wrong horse.

There are many degrees of "wrong"

woman working late in officeYour organization is looking to fill a position. After careful screening, "the data" might suggest that Candidate X is a particularly good fit. The applicant is hired and is failing miserably.

What to do?

There's no one solution here. It's folly to think that all corporate positions entail the same levels of responsibility, publicity and compensation. A mid-level marketing manager is hardly the same type of duck as a C-level appointment. What's more, the consequences for a botched hire can be disastrous. Case in point: Yahoo! in 2014 fired COO Henrique de Castro. His pay for a mere 15 months of work: a mind-boggling $108 million. People immediately began to question then-new CEO Marissa Mayer's judgment.

Simon Says: Depending on the circumstances, applicable labor laws and organizational fallout, you might just have to bite the bullet and cut a check. Try to make a better decision next time, especially for senior-level jobs. Read More »

Post a Comment

The Internet of Things and the question of data quality

driver using GPS navigationThe concept of the Internet of Things (IoT) is used broadly to cover any organization of communication devices and methods, messages streaming from the device pool, data collected at a centralized point, and analysis used to exploit the combined data for business value. But this description hides the richness of a computational model that can be adapted in various functional applications in some very different ways.  Let's consider two distinct examples.

Predictive maintenance for a manufacturing facility

In this scenario, factory machines are fitted with IoT devices combining sensors, actuators and communication modules. The sensors monitor the ambient measures of the environment (such as temperature, power use, air quality, vibrations) and communicate those values to a centralized analytics server. As devices fail, the analytics engine identifies sentinel patterns (such as increased vibrations and temperatures happening at the same time) that could be used to indicate imminent failures of similar devices. These patterns can be pushed back down to the device. And, when they occur, the device itself can alert a technician to schedule a line shut-down, request a replacement part, and bring the factory line back up in a reasonable, predictable time frame. Reducing unscheduled downtime lowers costs while retaining high throughput.

Read More »

Post a Comment

Data gone awry, Part 1: Will your business data deceive you? 

graphic designers at computer looking at dataYour organization has taken all of the right steps. It went all-in on big data. That is, it incorporated vast amounts of data from new sources, many of which lay outside of its control. It deployed Hadoop and encouraged employees to eschew their intuition and make data-based decisions.

And still it completely missed the boat on the viability of a new product, strategy, marketing campaign, direction or partnership. Think New Coke, Apple Maps and Microsoft Vista bad. Some pretty senior people at your organization aren't just scratching their heads. They're embarrassed and they're questioning the very idea of big data.

Read More »

Post a Comment

SAS® in the seat? Motorcycle racer goes streaming.

motorcycle racer thinks streaming dataSome people think motorcycle racing is a sport for thrill seekers and adrenaline junkies. So you might be surprised to hear that after many years of racing, I’ve found it to be a contemplative activity more related to precision and prediction than reckless abandon.

In motorcycle racing, the driver has to make many complex decisions in fractions of a second. Riders have to continually weigh and put into context a multitude of elements coming at them from constantly changing streams of data.

Success relies on making the best predictions as you accurately analyze shifting environmental variables. What was important milliseconds ago can quickly become irrelevant. Riders have to adjust almost instantly. Select inappropriately – and you can land in big trouble, literally.

It’s high-speed, high-stakes human decision making in action. Similar, I believe, to a business that has to make vital decisions based on constantly changing streams of data. Read More »

Post a Comment

Pushing data quality beyond boundaries

woman working on laptopThroughout my long career of building and implementing data quality processes, I've consistently been told that data quality could not be implemented within data sources, because doing so would disrupt production systems. Therefore, source data was often copied to a central location – a staging area – where it was cleansed, transformed, unduplicated, restructured and loaded into new applications, such as an enterprise data warehouse or master data management hub.

This paradigm of dragging data from where it lives through data quality processes that exist elsewhere (and whose results are stored elsewhere) had its advantages. But one of its biggest disadvantages was the boundary it created – original data lived in its source, but quality data lived someplace else.

Read More »

Post a Comment

Can SAS Data Management get you to soccer on time?

A soccer fairy tale

soccer kid and common data quality issuesImagine it's Soccer Saturday. You've got 10 kids and 10 loads of laundry – along with buried soccer jerseys – that you need to clean before the games begin. Oh, and you have two hours to do this. Fear not! You are a member of an advanced HOA (Home Owner Association) that provides an agile and flexible washing machine architecture (we'll call it AWA for Agile Washing Architecture) to improve the productivity of all your neighbors.

Whenever anyone in the neighborhood needs to wash a lot of laundry, they engage the AWA –  and a large, driverless, Uber-like minivan pulls up alongside the house with 10 advanced upright washing machines and dryers in it (with steam control!). You tell your 10 kids to load up one load of laundry each into the 10 machines in the AWA. Two hours later, your family is on the way to 10 soccer games in an extended minivan, with fresh new soccer clothes. Mission accomplished. (Luckily, today they are all playing in the same park.)

When it comes to managing data, this isn't just a fairy tale about advanced HOAs and getting to soccer on time (though both of these scenarios are indeed fairy tales). It's an example of what happens when we can distribute processing into discrete chunks (loads of laundry) across nodes on an Apache Hadoop cluster (washing machine) that runs some sort of data cleansing, blending or transformation functions (cleaning the laundry). Instead of taking 10 hours to wash 10 loads, it takes 1 hour to wash (and 1 to dry). Read More »

Post a Comment

Health insurance, healthcare and pharmaceutical perspective on data quality – Part 2

doctor with laptop in labAs I explained in Part 1 of this series, spelling my name wrong does bother me! However, life changes quickly at health insurance, healthcare and pharmaceutical companies. That said, taking unintegrated or cleansed data and propagating it to Hadoop may only help one issue. That would be the issue of getting the data into the hands of the business user or consumer quickly, based on the business requirements. The data could very well NOT be cleansed or integrated with other systems – but the objective is speed. So, let’s define this type of user:

  • They need the data as close to real-time as possible.
  • They probably do some type of analytics on the data.
  • They may not need the data cleansed because they are doing some type of fraud analysis, etc.
  • They may not need the data integrated because this business group is targeting the data in one source system or just part of the data in multiple source systems.
  • The data and analysis is NOT going outside of the company.
  • This data usually does not need to be stored for long periods of time, like data warehouse data.

Read More »

Post a Comment

Which comes first, data quality or data analytics?

chicken peeking out of eggWhile it’s obvious that chickens hatch from eggs that were laid by other chickens, what’s less obvious is which came first – the chicken or the egg? This classic conundrum has long puzzled non-scientists and scientists alike. There are almost as many people on Team Chicken as there are on Team Egg, meaning there are almost as many people who believe the chicken came first as there are people who believe the egg came first.

It turns out, however, the yolks on Team Chicken since the answer is ... the egg came first.

Read More »

Post a Comment

Data quality and cloud computing: What are the risks?

tablet in the cloudIf I have one gripe about technology today, it's the seemingly universal belief that it manifests social and business issues for the first time. Sure, Twitter may be a relatively new advent and only recently have people been fired for tweeting really inappropriate jokes. Still, if you think that losing your job for publicly saying something objectionable is new, think again.

And the same holds true with respect to data quality. Yes, contemporary cloud computing arrived fairly recently, although its roots in grid computing go back decades. Make no mistake, though: the notion that duplicate, erroneous, invalid and incomplete information harms a business is hardly new. It predates today's rampant technology, big data and even the modern-day computer. Bum handwritten general-ledger entries caused problems centuries ago.

This begs the question: What are the data quality risks specific to cloud computing? In a nutshell, I see three. Read More »

Post a Comment